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Appendix A 



DIRECTED EVOLUTION OF NOVEL BINDING PROTEINS 
This is a continuation of Serial No. 08/993,776 
filed December 18, 1997, now pending; which is a 
continuation of Serial No. 08/415,922, filed April 
3, 1995, now U.S. Patent No. 5,837,500; which is a 
continuation of Serial No. 08/009,319, filed January 
26, 1993, now U.S. Patent No. 5,403,484; which is a 
division of Serial No. 07/664,989, filed March 1, 
1991, now U.S. Patent No. 5,223,409; which is a 
continuation-in-part of Serial No. 07/487,063, filed 
March 2, 1990, now abandoned; which is a 
continuation-in-part of Serial No. 07/240,160, filed 
September 2, 1988, now abandoned. 

The prior application (s) set forth above are hereby 

incorporated by reference in their entirety. 

Cross-reference to Related Applications 

The following related and commonly-owned 

applications are also incorporated by reference: 

Robert Charles Ladner, Sonia Kosow Guterman, 

Rachael Baribault Kent, and Arthur Charles Ley are 

named as joint inventors on U.S. S.N. 07/293,980, 

filed January 8, 1989, ^and entitled GENERATION AND 

SELECTION OF NOVEL DNA- BINDING PROTEINS AND 

POLYPEPTIDES. This application has been assigned to 

Protein Engineering Corporation. 

Robert Charles Ladner, Sonia Kosow Guterman, 

and Bruce Lindsay Roberts are named as a joint 

inventors on a U.g.S.N. 07/470,651 filed 26 January 
nog; aJoCLt\dt 

19 90 , lent it led "PRODUCTION OF NOVEL SEQUENCE- 
SPECIFIC DNA- ALTERING ENZYMES" , likewise assigned to 
Protein Engineering Corp. 
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BACKGROUND OF THE INVENTION 

Field of the Invention 

This invention relates to development of novel 
binding proteins (including mini -proteins) by an 
iterative process of mutagenesis, expression, 
chromatographic selection, and amplification. In this 
process , " a gene - encoding a potential binding- domain, - 
said gene being obtained by random mutagenesis of a 
limited number of predetermined codons, is fused to a 
genetic element which causes the resulting chimeric 
expression product to be displayed on the outer surface 
of a virus (especially a filamentous phage) or a cell. 
Chromatographic selection is then used to identify 
viruses or cells whose genome includes such a fused 
gene which coded for the protein which bound to the 
chromatographic target . 
Information Disclosure Statement 
A. Protein Structure 

The amino acid sequence of a protein determines its 
three-dimensional (3D) structure, which in turn 
determines protein function (EPST63 , ANFI73) . Shortle 
(SHOR85) , Sauer and colleagues (PAKU86, REID88a) , and 
Caruthers and colleagues (EISE85) have shown that some 
residues on the polypeptide chain are more important 
than others in determining the 3D structure of a 
protein. The 3D structure is essentially unaffected by 
the identity of the amino acids at some loci; at other 
loci only one or a few types of amino acid is allowed. 
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In most cases, loci where wide variety is allowed have 
the amino acid side group directed toward the solvent. 
Loci where limited variety is allowed frequently have 
the side group directed toward other parts of the 
protein. Thus substitutions of amino acids that are 
exposed to solvent are less likely to affect the 3D 
structure than are substitutions at internal loci. (See 
also SCHU79, pl69-171 and CREI84, p239-245, 314-315) . 

The secondary structure (helices, sheets, turns, 
loops) of a protein is determined mostly by local 
sequence. Certain amino acids have a propensity to 
appear in certain "secondary structures," they will be 
found from time to time in other structures, and studies 
of pentapeptide sequences found in different proteins 
have shown that their conformation varies considerably 
from one occurrence to the next (KABS84, ARG087) . As a 
result, a priori design of proteins to have a particular 
3D structure is difficult. 

Several researchers have designed and synthesized 
proteins de novo (MOSE83 , MOSE87, ERIC86) . These 
designed proteins are small and most have been 
synthesized in vitro as polypeptides rather than 
genetically. Hecht et al . (HECH90) have produced a 
designed protein genetically. Moser, et al . state that 
design of biologically active proteins is currently 
impossible . 



B. Protein Binding Activity 

Many proteins bind non-covalently but very tightly 
and specifically to some other characteristic molecules 
(SCHU79, CREI84) . In each case the binding results from 
complementarity of the surfaces that come into contact : 
bumps fit into holes, unlike charges come together, 
dipoles align, and hydrophobic atoms contact other 
hydrophobic atoms. Although bulk water is excluded, 
individual water molecules are frequently found filling 
space in intermolecular interfaces; these waters usually 
form hydrogen bonds to one or more atoms of the protein 
or to other bound water. Thus proteins found in nature 
have not attained, nor do they require, perfect 
complementarity to bind tightly and specifically to 
their substrates. Only in rare cases is there 

essentially perfect complementarity; then the binding is 
extremely tight (as for example, avidin binding to 
biotin) . 

C. Protein Engineering 

"Protein engineering" is the art of manipulating 
the sequence of a protein in order to alter its binding 
characteristics. The factors affecting protein binding 
are known, (CHOT75, CHOT76, SCHU79, p98-107, and CREI84, 
Ch8) , but designing new complementary surfaces has 
proved difficult . Although some rules have been 

developed for substituting side groups (SUTC87b) , the 
side groups of proteins are floppy and it is difficult 
to predict what conformation a new side group will take. 
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F urther, the forces ^ proteins to other 

molecules are ,U relatively weax and it is difficult to 
predict the effects of these forces. 

Recently, Quiocho and collaborators (QUI087, 

. Gpvpr ,i neriplasmic binding 
elucidated the structures of * P 

proteins from Gram-negative bacteria. They fou 
the proteins, despite having low serene* homology and 
di «erences in structural detail, have certain important 
structural similarities. Based on their investigates 
o£ these binding proteins, Quiocho et aL suggest 
unlikely that, using current protein engineering 

_ • «. can be constructed with binding 
methods, proteins can be co 

properties superior to those of proteins that occur 

naturally ^ . solated 

Nonetheless, there ncive 
successes. Wilkinson et (WILK84) repor ted that ^ a 

mutant of the tyrosyl synthetase of M^H, 

^^ ££a£E ^ wi th the mutation Thr sl ->Pro exhibits 
^^H™e in affinity for ATP. Tan and Ka 
(TANK77) and Tschesche et ^ (TSCH87) showed that 
changing a single amino acid in mini-protein greatly 

, « , _ .^sin but that some of tne 
reduces its binding to trypsin, d 

mutants retained the parental characteristic of binding 
to an inhibiting chymotrypsin, while others exhibited 
new binding to elastase. Caruthers and others ,««8« 
ha ve shown that changes of single amino acids on the 
surface of the lambda Cro repressor greatly reduce its 
affinity for the natural operator 0,3, but greatly 
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increase the binding of the mutant protein te a — 
operator. Chan g in g three residues in subtilrsrn £ 

Ba=in» B asioliSffi^ciH- - the — 6 " ^ 

emending residues in subtilisin from 5, 
lichenLrmis produoed a protease having nearly the same 
tt^I^TTTthe latter subtilisin, even though S, amino 
acid seance differences reined (WKI^Va, — - 

ift amino acids (corresponding to Pro 
of DNA encoding 18 ammo <* 

Glu -Bynorphin- G ly> into the E_^ coli phoA gene so that 
the additional amino acids appeared within a loop of 
airline phosphatase protein resulted in a chimeric 
protein having both pho R and dynorphin activity 
" FREI9 0,. Thus, changing the surface of a binding 

„ alter its specificity without abolishing 

protein may alter itb *v 

binding activity. 

D Techniques o£ Mutagenesis 

' E arly techniques of mutating proteins involved 
^nipulations at the amino acid sequence level. In the 
semisynthetic method <TSC„S7» . the protein was cleaved 
into two fragments, a residue removed from the new end 
of one fragment, the substitute residue added on 
place, and the modified fragment Joined with the other, 
original fragment. Alternatively, the mutant protein 
could be synthesis in its entirety (TANK77) . 

Ericsson et suggested that mixed amxno acid 

reagents could be used to produce a family of sequence- 
related proteins which could then be screened by 
affinity chromatography (ERIC86) They envision 
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successive rounds of mixed synthesis of variant proteins 
and purification by specific binding. They do not 
discuss how residues should be chosen for variation. 
Because proteins cannot be amplified, the researchers 
must sequence the recovered protein to learn which 
substitutions improve binding. The researchers must 
limit the level of diversity so that each variety of 
protein will be present in sufficient quantity for the 
isolated fraction to be sequenced. 

With the development of recombinant DNA techniques, 
it became possible to obtain a mutant protein by 
mutating the gene encoding the native protein and then 
expressing the mutated gene. Several mutagenesis 

strategies are known. One, "protein surgery" (DILL87) , 
involves the introduction of one or more predetermined 
mutations within the gene of choice. A single 

polypeptide of completely predetermined sequence is 
expressed, and its binding characteristics are 
evaluated . 

At the other extreme is random mutagenesis by means 
of relatively nonspecific mutagens such as radiation and 
various chemical agents. See Ho et al . (HOCJ85) and 
Lehtovaara, E.P. Appln. 285,123. 

It is possible to randomly vary predetermined 
nucleotides using a mixture of bases in the appropriate 
cycles of a nucleic acid synthesis procedure. The 
proportion of bases in the mixture, for each position of 
a codon, will determine the frequency at which each 
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amino acid will occur in the polypeptides expressed from 
the degenerate DNA population. Oliphant et al . (OLIP86) 
and Oliphant and Struhl . (OLIP87) have demonstrated 
ligation and cloning of highly degenerate 
oligonucleotides, which were used in the mutation of 
promoters. They suggested that similar methods could be 
used in the variation of protein coding regions. They 
do not say how one should: a) choose protein residues 
to vary, or b) select or screen mutants with desirable 
properties. Reidhaar-Olson and Sauer (REID88a) have 
used synthetic degenerate oligo-nts to vary 
simultaneously two or three residues through all twenty 
amino acids. See also Vershon et al . (VERS86a; 

VERS86b) . Reidhaar-Olson and Sauer do not discuss the 
limits on how many residues could be varied at once nor 
do they mention the problem of unequal abundance of DNA 
encoding different amino acids. They looked for 

proteins that either had wild-type dimerization or that 
did not dimerize. They did not seek proteins having 
novel binding properties and did not find any. This 
approach is likewise limited by the number of colonies 
that can be examined (ROBE8 6) . 

To the extent that this prior work assumes that it 
is desirable to adjust the level of mutation so that 
there is one mutation per protein, it should be noted 
that many desirable protein alterations require multiple 
amino acid substitutions and thus are not accessible 
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through single base changes or even through all possible 

amino acid substitutions at any one residue. 

D Affinity Chromatography of Cells 

Ferenci and coloborators have published a series of 

papers on the chromatographic isolation of mutants of 

the maltose-transport protein LamB of L coll (FERE82a, 
FERE82b, FERE 8 3 , FERES 4 , CLUN84 , HEIN87 and papers cxted 
therein). The mutants were either spontaneous or 
induced with nonspecific chemical mutagens. Levels of 
mu tagenesis were picked to provide single poxnt 
mutations or single insertions of two residues. No 
multiple mutations were sought or found. 

While variation was seen in the degree of affxnxty 

-■«t,=.i TamR substrates maltose and starch, 
for the conventional LamB suDswai-c 

there, was no .election for affinity to a target molecule 
not bound at all by native LamB, and no multiple 
stations ware sought or found. FERE 8 4 speculated that 
the affinity chromatographic selection technique could 
be adapted to development of similar mutants of other 
•■important bacterial surface-located enzymes", and to 
selecting for mutations which result in the relocation 
of an intracellular bacterial protein to the eel 
surface Ferenci <s mutant surface proteins would not, 
however, have been chimeras of a bacterial surface 
protein and an exogenous or heterologous binding domarn. 

Ferenci also taught that there was no need to clone 
the structural gene, or to know the protein structure, 
active site, or sequence. The method of the present 
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invention, however, specifically utilizes a cloned 
structural gene. It is not possible to construct and 
express a chimeric, outer surface-directed potential 
binding protein- encoding gene without cloning. 

Ferenci did not limit the mutations to particular 
loci or particular substitutions. In the present 

invention, knowledge of the protein structure, active 
site and/or sequence is used as appropriate to predict 
which residues are most likely to affect binding 
activity without unduly destabilizing the protein, and 
the mutagenesis is focused upon those sites. Ferenci 
does not suggest that surface residues should be 
preferentially varied. In consequence, Ferenci 1 s 

selection system is much less efficient than that 
disclosed herein. 

E. Bacterial and Viral Expression of Chimeric Surface 
Proteins 

A number of researchers have directed unmutated 
foreign antigenic epitopes to the surface of bacteria or 
phage, fused to a native bacterial or phage surface 
protein, and demonstrated that the epitopes were 
recognized by antibodies. Thus, Charbit, et al . 

(CHAR86) genetically inserted the C3 epitope of the VP1 
coat protein of poliovirus into the LamB outer membrane 
protein of E. coli , and determined immunologically that 
the C3 epitope was exposed on the bacterial cell 
surface. Charbit, et al . (CHAR87) likewise produced 
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chimeras of LamB and the A (or B) epitopes of the preS2 
region of hepatitis B virus. 

A chimeric LacZ/OmpB protein has been expressed in 
E. coli and is, depending on the fusion, directed to 
either the outer membrane or the periplasm (SILH77) . A 
chimeric LacZ/OmpA surface protein has also been 
expressed and displayed on the surface of coli cells 
(Weinstock et al . , WEIN83) . Others have expressed and 
displayed on the surface of a cell chimeras of other 
bacterial surface proteins, such as coli type 1 

fimbriae (Hedegaard and Klemm (HEDE8 9) ) and Bacterioides 
nodus>us type 1 fimbriae (Jennings et al . , JENN89) . In 
none of the recited cases was the inserted genetic 
material mutagenized . 

Dulbecco (DULB86) suggests a procedure for 
incorporating a foreign antigenic epitope into a viral 
surface protein so that the expressed chimeric protein 
is displayed on the surface of the virus in a manner 
such that the foreign epitope is accessible to antibody. 
In 1985 Smith (SMIT85) reported inserting a 
nonfunctional segment of the EcoRI endonuclease gene 
into gene III of bacteriophage fl, "in phase". The gene 
III protein is a minor coat protein necessary for 
infectivity. Smith demonstrated that the recombinant 
phage were adsorbed by immobilized antibody raised 
against the Eco RI endonuclease, and could be eluted with 
acid. De la Cruz et al . (DELA8 8) have expressed a 
fragment of the repeat region of the circumsporozoite 
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as an insert in the gene II! protein. They she 
the recombinant phage were both antigenic and 

■ t-hat such recombinant phage 

immunogenic in rabbits, and that such era 

could be used for B epitope mapping. The 

Zest that similar recombinant phage could be used for 

T epitope mapping- and for valine development.- 

Hone of these researchers suggested mutageneses of 
the inserted material, nor is the inserted material a 

^ — - - 

protein the ability to bind specifically to a recept 
Tther than the antigen combining site of an ^ant^ 

- =1 (MCCA90) expressed a fusion 
McCafferty et al^. (MCCAyu; v 

, v the N- terminal ot tne 
an F v fragment of an antibody to the n 

pill protein. The Fv fragment was not mutated. 
F . Epitope Libraries on Fusion Phage 

Parmley and Smith (PARM88) suggested that 
epitope library that exhibits all possible 
lid be constructed and used to isolate epitope ^t 
bind to antibodies. In discussing the epitope library 
the authors did not suggest that it was desirable t 

„f different amino acids, 
balance the representation of differe 

Nor did they teach that the insert should encode 
complete domain of the exogenous protein. Epitop ^ are 
considered to be unstructured peptides as opposed 
^ctured protein. ^ ^ ^ ^ 

benefit is claimed herein under 35 U.S.C. certain 
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groups reported the construction of "epitope libraries." 
Scott and Smith (SCOT90) and Cwirla et al . (CWIR90) 
prepared "epitope libraries" in which potential 
hexapeptide epitopes for a target antibody were randomly 
mutated by fusing degenerate oligonucleotides, encoding 
the epitopes, with gene III of fd phage, and expressing 
the fused gene in phage -irif ected cells. The cells 
manufactured fusion phage which displayed the epitopes 
on their surface; the phage which bound to immobilized 
antibody were eluted with acid and studied. In both 
cases, the fused gene featured a segment encoding a 
spacer region to separate the variable region from the 
wild type pill sequence so that the varied amino acids 
would not be constrained by the nearby pill sequence. 
Devlin et al . (DEVL90) similarly screened, using M13 
phage, for random 15 residue epitopes recognized by 
streptavidin . Again, a spacer was used to move the 
random peptides away from the rest of the chimeric phage 
protein. These references therefore taught away from 
constraining the conformational repertoire of the 
mutated residues. 

Another problem with the Scott and Smith, Cwirla et 
al . , and Devlin et al . , libraries was that they provided 
a highly biased sampling of the possible amino acids at 
each position. Their primary concern in designing the 
degenerate oligonucleotide encoding their variable 
region was to ensure that all twenty amino acids were 
encodible at each position; a secondary consideration 
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was minimizing the frequency of occurrence of stop 
signals. Consequently, Scott and Smith and Cwirla et 
al . employed NNK (N=equal mixture of G, A, T, C; K=equal 
mixture of G and T) while Devlin et al . used NNS 
(S=equal mixture of G and C) . There was no attempt to 
minimize the frequency ratio of most favored- to-least 
favored amino acid, or to equalize the rate of 
occurrence of acidic and basic amino acids. 

Devlin et al . characterized several affinity- 
selected streptavidin-binding peptides, but did not 
measure the affinity constants for these peptides. 
Cwirla et al . did determine the affinity constant for 
his peptides, but were disappointed to find that his 
best hexapeptides had affinities (350-300nM) , "orders of 
magnitude" weaker than that of the native Met- 

enkephalin epitope (7nM) recognized by the target 
antibody. Cwirla et al . speculated that phage bearing 
peptides with higher affinities remained bound under 
acidic elution, possibly because of multivalent 
interactions between phage (carrying about 4 copies of 
pill) and the divalent target IgG. Scott and Smith were 
able to find peptides whose affinity for the target 
antibody (A2) was comparable to that of the reference 
myohemerythrin epitope (50nM) . However, Scott and Smith 
likewise expressed concern that some high-affinity 
peptides were lost, possibly through irreversible 
binding of fusion phage to target. 
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G. Non- Commonly Owned Patents and Applications Naming 
Robert Ladner as an Inventor 

Ladner, US Patent No. 4,704,692, "Computer Based 
System and Method for Determining and Displaying 
Possible Chemical Structures for Converting Double- or 
Multiple-Chain Polypeptides to Single-Chain 

Polypeptides" describes a design method for converting 
proteins composed of two or more chains into proteins of 
fewer polypeptide chains, but with essentially the same 
3D structure. There is no mention of variegated DNA and 
no genetic selection. Ladner and Bird, WO88/01649 
(Publ. March 10, 1988) disclose the specific application 
of computerized design of linker peptides to the 
preparation of single chain antibodies. 

Ladner, Glick, and Bird, WO88/06630 (publ. 7 Sept. 
1988 and having priority from US application 07/021,046, 
assigned to Genex Corp.) (LGB) speculate that diverse 
single chain antibody domains (SCAD) may be screened for 
binding to a particular antigen by varying the DNA 
encoding the combining determining regions of a single 
chain antibody, subcloning the SCAD gene into the gpV 
gene of phage lambda so that a SCAD/gpV chimera is 
displayed on the outer surface of phage lambda, and 
selecting phage which bind to the antigen through 
affinity chromatography. The only antigen mentioned is 
bovine growth hormone. No other binding molecules, 
targets, carrier organisms, or outer surface proteins 
are discussed. Nor is there any mention of the method 
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or degree of mutagenesis. Furthermore, there is no 
teaching as to the exact structure of the fusion nor of 
how to identify a successful fusion or how to proceed if 
the SCAD is not displayed. 

Ladner and Bird, WO88/06601 (publ . 7 September 

_ .1.9 8.8). _ suggest _ _ . that _ single chain JLpseudodimeric " 

repressors (DNA-binding proteins) may be prepared by 
mutating a putative linker peptide followed by in vivo 
selection that mutation and selection may be used to 
create a dictionary of recognition elements for use in 
the design of asymmetric repressors. The repressors are 
not displayed on the outer surface of an organism. 

Methods of identifying residues in protein which 
can be replaced with a cysteine in order to promote the 
formation of a protein-stabilizing disulfide bond are 
given in Pantoliano and Ladner, U.S. Patent No. 
4,903,773 (PANT90) , Pantoliano and Ladner (PANT87) , 
Pabo and Suchenek (PABO86) , MATS 8 9 , and SAUE86. 

No admission is made that any cited reference is 
prior art or pertinent prior art, and the dates given 
are those appearing on the reference and may not be 
identical to the actual publication date'. All 
references cited in this specification are hereby 
incorporated by reference. 
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SUMMARY OF THE INVENTION 

The present invention is intended to overcome the 
deficiencies discussed above. It relates to the 

construction, expression, and selection of mutated genes 
that specify novel proteins with desirable binding 
properties, as well as these proteins themselves. The 
substances bound by these proteins , hereinafter referred 
to as "targets", may be, but need not be, proteins. 
Targets may include other biological or synthetic 
macromolecules as well as other organic and inorganic 
substances. 

The fundamental principle of the invention is one 
of forced evolution . In nature, evolution results from 
the combination of genetic variation, selection for 
advantageous traits, and reproduction of the selected 
individuals, thereby enriching the population for the 
trait. The present invention achieves genetic variation 
through controlled random mutagenesis (" variegation ") of 
DNA, yielding a mixture of DNA molecules encoding 
different but related potential binding proteins. It 
selects for mutated genes that specify novel proteins 
with desirable binding properties by 1) arranging that 
the product of each mutated gene be displayed on the 
outer surface of a replicable genetic package (GP) (a 
cell, spore or virus) that contains the gene, and 2) 
using affinity selection-- selection for binding to the 
target material -- to enrich the population of packages 
for those packages containing genes specifying proteins 
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with improved binding to that target material . Finally, 
enrichment is achieved by allowing only the genetic 
packages which, by virtue of the displayed protein, 
bound to the target, to reproduce. The evolution is 
"forced" in that selection is for the target material 
provided. 

The display strategy is first perfected by 
modifying a genetic package to display a stable, 
structured domain (the " initial potential binding 
domain " , IPBD) for which an affinity molecule (which may 
be an antibody) is obtainable. The success of the 
modifications is readily measured by, e.g. , determining 
whether the modified genetic package binds to the 
affinity molecule. 

The IPBD is chosen with a view to its tolerance for 
extensive mutagenesis. Once it is known that the IPBD 
can be displayed on a surface of a package and subjected 
to affinity selection, the gene encoding the IPBD is 
subjected to a special pattern of multiple mutagenesis, 
here termed "variegation", which after appropriate 
cloning and amplification steps leads to the production 
of a population of genetic packages each of which 
displays a single potential binding domain (a mutant of 
the IPBD) , but which collectively display a multitude of 
different though structurally related potential binding 
domains (PBDs) . Each genetic package carries the 

version of the pbd gene that encodes the PBD displayed 
on the surface of that particular package. Affinity 
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selection is then used to identify the genetic packages 
bearing the PBDs with the desired binding 
characteristics, and these genetic packages may then be 
amplified. After one or more cycles of enrichment by 
affinity selection and amplification, the DNA encoding 
the successful binding domains (SBDs) may then be 
recovered from selected packages. 

If need be, the DNA from the SBD-bearing packages 
may then be further "variegated" , using an SBD of the 
last round of variegation as the "parental potential 
binding - domain" (PPBD) to the next generation of PBDs, 
and the process continued until the worker in the art is 
satisfied with the result. At that point, the SBD may 
be produced by any conventional means, including 
chemical synthesis . 

When the number of different amino acid sequences 
obtainable by mutation of the domain is large when 
compared to the number of different domains which are 
displayable in detectable amounts, the efficiency of the 
forced evolution is greatly enhanced by careful choice 
of which residues are to be varied. First, residues of 
a known protein which are likely to affect its binding 
activity ( e.g. , surface residues) and not . likely to 
unduly degrade its stability are identified. Then all 
or some of the codons encoding these residues are varied 
simultaneously to produce a variegated population of 
DNA. The variegated population of DNA is used to 
express a variety of potential binding domains, whose 
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ability to bind the target of interest may then be 
evaluated. 

The method of the present invention is thus further 
distinguished from other methods in the nature of the 
highly variegated population that is produced and from 
which novel binding proteins are selected. We force the 
displayed potential binding domain to sample the nearby 
"sequence space" of related amino-acid sequences in an 
efficient, organized manner. Four goals guide the 
various variegation plans used herein, preferably: 1) a 
very large number ( e.g. 10 7 ) of variants is available, 2) 
a very high percentage of the possible variants actually 
appears in detectable amounts, 3) the frequency of 
appearance of the desired variants is relatively 
uniform, and 4) variation occurs only at a limited 
number of amino-acid residues, most preferably at 
residues having side groups directed toward a common 
region on the surface of the potential binding domain. 

This is to be distinguished from the simple use of 
indiscriminate mutagenic agents such as radiation and 
hydroxylamine to modify a gene, where there is no (or 
very oblique) control over the site of mutation. Many 
of the mutations will affect residues that are not a 
part of the binding domain. Moreover, since at a 
reasonable level of mutagenesis, any modified codon is 
likely to be characterized by a single base change, only 
a limited and biased range of possibilities will be 
explored. Equally remote is the use of site- specif ic 
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mutagenesis techniques employing mutagenic 

oligonucleotides of nonrandomized sequence, since these 
techniques do not lend themselves to the production and 
testing of a large number of variants. While focused 
random mutagenesis techniques are known, the importance 
of controlling the distribution of variation has been 
largely overlooked . 

In order to obtain the display of a multitude of 
different though related potential binding domains, 
applicants generate a heterogeneous population of 
replicable genetic packages each of which comprises a 
hybrid gene including a first DNA sequence which encodes 
a potential binding domain for the target of interest 
and a second DNA sequence which encodes a display means, 
such as an outer surface protein native to the genetic 
package but not natively associated with the potential 
binding domain (or the parental binding domain to which 
it is related) which causes the genetic package to 
display the corresponding chimeric protein (or a 
processed form thereof) on its outer surface. 

It should be recognized that by expressing a hybrid 
protein which comprises an outer surface transport 
signal not natively associated with the binding domain, 
the utility of the present invention is greatly 
extended. The binding domain need not be that of a 
surface protein of the genetic package (or, in the case 
of a viral package, of its host cell) , since the 
provided outer surface transport signal is responsible 
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£ or achieving the desired display. Thus, it is possible 
to display on the surface of a phage, bacterial cell or 
bacterial spore a binding domain related to the bxndxn, 
domain of a normally cytoplasmic binding protein, or the 
binding domain of eukaryotic protein which is not found 
on the surface of prokaryotic cells or viruses. 

Another important aspect .of_ the ^invention is that 

each potential binding domain remains physically 
associated with the particular DN A molecule whxch 
encodes it. Thus, once successful binding domains . are 
identified, one may readily recover the gene 
expre ss additional quantities of the novel brndmg 
protein or further mutate the gene. The form that thx. 
association takes is a -replicable genetic package" a 
virus, cell or spore which replicates and expresses the 
binding domain-encoding gene, and transports the brndrng 
domain to its outer surface. 

It is also possible chemically or enzymatrcally to 
m odify the PBDs before selection. The selection then 
identifies the best modified amino acid sequence. For 
example, we could treat the variegated population of 

^ i-**t- display a variegated population ot 

genetic packages that display a 

•^v, = nrohein tyrosine kinase and then 
binding domains with a protein tyro 

select for binding the target. Any tyrosines on the BD 
surface will be phosphorylated and this could affect the 

Other chemical or enzymatic 
binding properties. utner 

modifications are possible. 
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By virtue of the present invention, proteins are 
obtained which can bind specifically to targets other 
than the antigen-combining sites of antibodies. A 
protein is not to be considered a "binding protein" 
merely because it can be bound by an antibody (see 
definition of "binding protein" which follows) . While 
almost any amino acid sequence of more than about 6-8 
amino acids is likely, when linked to an immunogenic 
carrier, to elicit an immune response, any given random 
polypeptide is unlikely to satisfy the stringent 
definition of "binding protein" with respect to minimum 
affinity and specificity for its substrate. It is only 
by testing numerous random polypeptides simultaneously 
(and, in the usual case, controlling the extent and 
character of the sequence variation, i.e. , limiting it 
to residues of a potential binding domain having a 
stable structure, the residues being chosen as more 
likely to affect binding than stability) that this 
obstacle is overcome. 

In one embodiment, the invention relates to: 
a) preparing a variegated population of replicable 
genetic packages, each package including a nucleic 
acid construct coding for an outer-surface- 
displayed potential binding protein other than an 
antibody, comprising (i) a structural signal 
directing the display of the protein (or a 
processed form thereof) on the outer surface of the 
package and (ii) a potential binding domain for 



24 



binding said target, where the population 
collectively displays a multitude of different 
potential binding domains having a substantially 
predetermined range of variation in sequence, 

b) causing the expression of said protein and the 
display of said protein on the outer surface of 
such packages, 

c) contacting the packages with target material, other 
than an antibody with an exposed antigen-combining 
site , so that the potential binding domains of the 
proteins and the target material may interact , and 
separating packages bearing a potential binding 
domain that succeeds in binding the target material 
from packages that do not so bind, 

d) recovering and replicating at least one package 
bearing a successful binding domain, 

e) determining the amino acid sequence of the 
successful binding domain of a genetic package 
which bound to the target material, 

f) preparing a new variegated population of replicable 
genetic packages according to step (a) , the 
parental potential binding domain for the potential 
binding domains of said new .packages being a 
successful binding domain whose sequence was 
determined in step (e) , and repeating steps (b) - (e) 
with said new population, and, when a package 
bearing a binding domain of desired binding 
characteristics is obtained, 
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g) abstracting the DNA encoding the desired binding 
domain from the genetic package and placing it into 
a suitable expression system. (The binding domain 
may then be expressed as a unitary protein, or as a 
domain of a larger protein) . 

The invention is not, however, limited to proteins 
with a single BD since the method may be applied to any 
or all of the BDs of the protein, sequentially or 
simultaneously. The invention is not, however, limited 
to biological synthesis of the binding domains; peptides 
having an amino- acid sequence determined by the isolated 
DNA can be chemically synthesized. 

The invention further relates to a variegated 
population of genetic packages. Said population may be 
used by one user to select for binding to a first 
target, by a second user to select for binding to a 
second target, and so on, as the present invention does 
not require that the initial potential binding domain 
actually bind to the target of interest, and the 
variegation is at residues likely to affect binding. 
The invention also relates to the variegated DNA used in 
preparing such genetic packages. 

The invention likewise encompasses the procedure by 
which the display strategy is verified. The genetic 
packages are engineered to display a single IPBD 
sequence. (Variability may be introduced into DNA 

subsequences adjacent to the ipbd subsequence and within 
the osp-ipbd gene so that the IPBD will appear on the GP 
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surface.) A molecule, such as an antibody, having high 
affinity for correctly folded IPBD is used to: a) 
detect IPBD on the GP surface, b) screen colonies for 
display of IPBD on the GP surface, or c) select GPs that 
display IPBD from a population, some members of which 
might display IPBD on the GP surface. In one preferred 
embodiment, this verification process (part I) involves: 

1) choosing a GP such as a bacterial cell, bacterial 
spore, or phage, having a suitable outer surface 
protein (OSP) , 

2) choosing a stable IPBD, 

3) designing an amino acid sequence that: a) includes 
the IPBD as a subsequence and b) will cause the 
IPBD to appear on the GP surface, 

4) engineering a gene, denoted osp-ipbd , that: a) 
codes for the designed animo acid sequence, b) 
provides the necessary genetic regulation, and c) 
introduces convenient sites for genetic 
manipulation, 

5) cloning the osp-ipbd gene into the GP, and 

6) harvesting the transformed GPs and testing them for 
presence of IPBD on the GP surface; this test is 
performed with an affinity molecule having high 
affinity for IPBD, denoted Af M (IPBD) . 

Once a GP(IPBD) is produced, it can be used many 
times as the starting point for developing different 
novel proteins that bind to a variety of different 
targets. The knowledge of how we engineer the 
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appearance of one IPBD on the surface of a GP can be 
used to design and produce other GP(IPBD)s that display 

different IPBDs. 

Knowing that a particular genetic package and osp- 
jpbd fusion are suitable for the practice of the 
invention, we may variegate the genetic packages and 
select for binding to a target of interest. Using IPBD 
as the PPBD to the first cycle of variegation, we 
prepare a wide variety of osp-pbd genes that encode a 
wide variety of PBDs . We use an affinity separation to 
enrich the population of GP (vgPBD) s for GPs that display 
PBDs with binding properties relative to the target that 
are superior to the binding properties of the PPBD. An 
SBD selected from one variegation cycle becomes the PPBD 
to the next variegation cycle. In a preferred 

embodiment, Part II of the process of the present 
invention involves: 

1) picking a target molecule, and an affinity 
separation system which selects for proteins having 
an affinity for that target molecule, 

2) picking a GP(IPBD), 

3) picking a set of several residues in the PPBD to 
vary; the principal indicators of which residues to 
vary include: a) the 3D structure of the IPBD, b) 
sequences of homologous proteins, and c) computer 
or theoretical modeling that indicates which 
residues can tolerate different amino acids without 
disrupting the underlying structure, 
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4) picking a subset of the residues picked in Part 
II. 3, to be varied simultaneously; the principal 
considerations are the number of different variants 
and which variants are within the detection 
capabilities of the affinity separation system, and 
setting the range of variation; 

5) implementing the variegation by: 

a) synthesizing the part of the osp-pbd gene that 
encodes the residues to be varied using a 
specific mixture of nucleotide substrates for 
some or all of the bases encoding residues 
slated for variation, thereby creating a 
population of DNA molecules, denoted vgDNA, 

b) ligating this vgDNA, by standard methods, into 
the operative cloning vector (OCV) ( e.g. a 
plasmid or bacteriophage) , 

c) using the ligated DNA to transform cells, 
thereby producing a population of transformed 
cells , 

d) culturing ( i.e. increasing in number) the 
population of transformed cells and harvesting 
the population of GP(PBD)s, said population 
being denoted as GP (vgPBD) , 

e) enriching the population for GPs that bind the 
target by using affinity separation, with the 
chosen target molecule as affinity molecule, 

f) repeating steps II. 5. d and II. 5. e until a 
GP(SBD) having improved binding to the target 
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is isolated, and affinity 
-^o the isolated SBD or SBDs 
g) testing the 

and specificity f ^ ^ ^ the 

6, repeating steps II. 3. ■ 

. j o£ binding is obtainea. 

desired degree of t Mteria l . 

...^ ^ repeated f« « 
part ! need be repeated only 

. chosen target is available ^ ^ ^ q£ seds 

For each target th ^ ^ present 

that -ay be found by t ^ con fcination of 

invention. The process ^ probabilities, and 

mnsiderations, f 
protein struct- 1 con ^ rf „ ati on. To 

tar geted mutations «t ^ ^ ^ populatl on 

increas e the probability t ^ ^ ^ a 

"Jtan—ielly sublet to selection- 
population as we can c ^ Rey ques tions in 
through-binding rn ° ^ tra „ s£orman ts can 
management o£ the method are ^ ^ £ . nd 
we producer', and -How ^ Qptimum ievel 
through selection-through-brnd, g. ^ ^ o£ 

o£ variegation » det sengitivity , so that £or 

transforms and . a progre3S1 ve 

any reasonable q£ s with higher and 

process to obtain . s * 

higher affinity for the incorporated by 

Th e appended claims are hereby o£ 
reference into this specification as 
tne preferred embodiments. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows how a phage may be used as a genetic 
package. At (a) we have a wild- type precoat 
protein lodged in the lipid bilayer. The signal 
peptide is in the periplasmic space. At (b) , a 
chimeric precoat protein, with a potential binding 
domain interposed between the signal peptide and 
the mature coat protein sequence, is similarly 
trapped. At (c) and (d) , the signal peptide has 
been cleaved off the wild-type and chimeric 
proteins, respectively, but certain residues of the 
coat protein sequence interact with the lipid 
bilayer to prevent the mature protein from passing 
entirely into the periplasm. At (e) and (f ) , 
mature wild-type and chimeric protein are assembled 
into the coat of a single stranded DNA phage as it 
emerges into the periplasmic space. The phage will 
pass through the outer membrane into the medium 
where it can be recovered and chromatographically 
evaluated . 

Figure 2 depicts (a) the optimal stereochemistry of a 
disulfide bond, based on Creighton, "Disulfide 
Bonds and Protein Stability" (CREI88) (the two 
possible torsion angles about the disulfide bond of 
+ 90° and -90° are equally likely) , and (b) the 
standard geometric parameters for the disulfide 
bond, following Katz and Kossiakoff (KATZ86) . The 
average Ca-Ca? distance is 5-6 A, and the typical S- 
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S bond length is ~2 . 0 A. Many left-hand disulfides 
adopt as a preferred geometry Xl=-60°, X2=-60°, 
X3=-85°, X2 t =-60° / XI ' =-60° , Cof-Ca = 5.88 A; right- 
hand disulfides are more variable. 

Figure 3 shows a mini -protein comprising eight residues, 
numbered 4 through 11 and in which residues 5 and 
10 are joined by a disulfide. The £ carbons are 
labeled for residues 4, 6, 7, 8, 9, and 11; these 
residues are preferred sites of variegation. 

Figure 4 shows the C a of the coat protein of phage fl. 

Figure 5 shows the construction of M13-MB51. 

Figure 6 shows construction of MK-BPTI , also known as 
BPTI-III MK. 

Figure 7 illustrates fractionation of the Mini PEPI 
library on HNE beads. The abscissae shows pH of 
buffer. The ordinants show amount of phage (as 
fraction of input phage) obtained at given pH. 
Ordinants scaled by 10 3 . 

Figure 8 illustrates fractionation of the MYMUT PEPI 
library on HNE beads. The abscissae shows pH of 
buffer. The ordinants show amount of phage (as 
fraction of input phage) obtained at given pH. 
Ordinants scaled by 103. 

Figure 9 shows the elution profiles for EpiNE clones 1, 
3, and 7. Each profile is scaled so that the peak 
is 1.0 to emphasize the shape of the curve. 

Figure 10 shows pH profile for the binding of BPTI-III 
MK and EpiNE 1 on cathepsin G beads. The abscissae 
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shows pH of buffer. The ordinants show amount of 

phage (as fraction of input phage) obtained at 

given pH. Ordinants scaled by 103. 
Figure 11 shows pH profile for the f raxctionat ion of the 

MYMUT Library on cathepsin G beads. The abscissae 

shows pH of buffer. The ordinants show amount of 

phage (as fraction of input phage) obtained at 

given pH. Ordinants scaled by 103. 
Figure 12 shows a second fractionation of MYMUT library 

over cathepsin G. 
Figure 13 shows elution profiles on immobilized 

cathepsin G for phage selected for binding to 

cathepsin G. 

Figure 14 shows the Cofs of BPTI and interaction set #2 . 

Figure 15 shows the main chain of scorpion toxin 
(Brookhaven Protein Data Bank entry 1SN3) residues 
2 0 through 42. CYS 2 s and CYS 4 i are shown forming a 
disulfide. In the native protein these groups form 
disulfides to other cysteines, but no main-chain 
motion is required to bring the gamma sulphurs into 
acceptable geometry. Residues, other than GLY, are 
labeled at the & carbon with the one-letter code. 

Figure 16 shows profiles of the elustion of phage that 
display EpiNE7 and EpiNE7.23 from HNE beads. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 



OVERVIEW 

I. DEFINITIONS AND ABBREVIATIONS 
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II. THE INITIAL POTENTIAL BINDING DOMAIN 

A. Generally 

B. Influence of Target Size on Choice of IPBD 

C. Influence of Target Charge on Choice of IPBD 

D. Other Considerations in the Choice of IPBD 

E. Bovine Pancreatic Trypsin Inhibitor (BPTI) as 
an IPBD 

F. Mini-Proteins as IPBDs 

G. Modified PBDs 

III. VARIEGATION STRATEGY - MUTAGENESIS TO OBTAIN 
POTENTIAL BINDING DOMAINS WITH DESIRED DIVERSITY 

A. Generally 

B. Identification of Residues to be Varied 

C. Determining the Substitution Set for Each 
Parental Residue 

D. Special Considerations Relating to Variegation 
of Mini-Proteins with Essential Cysteines 

E. Planning the Second and Later Rounds of 
Variegation 

IV. DISPLAY STRATEGY - DISPLAYING FOREIGN BINDING 
DOMAINS ON THE SURFACE OF A "GENETIC PACKAGE" 

A. General Requirements for Genetic Package 

B. Phages for Use as Genetic Packages 

C. Bacterial Cells as Genetic Packages 

D. Bacterial Spores as Genetic Packages 

E. Artificial Outer Surface Protein 

F. Designing the osp : : ipbd Gene- Insert 

G. Synthesis of Gene Inserts 
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H. Operative Cloning Vector 

I. Transformation of Cells 

J. Verification of Display Strategy 
K, Analysis and Correction of Display Problems 
V. AFFINITY SELECTION OF TARGET -BINDING MUTANTS 

A. Affinity Separation Technology, Generally 

B. Affinity Chromatography, Generally 

C. Fluorescent -Activated Cell Sorting, Generally 

D. Affinity Electrophoresis, Generally 

E. Target Materials 

F. Immobilization or Labeling of Target Material 

G. Elution of Lower Affinity PBD-Bearing Packages 

H. Optimization of Affinity Separation 

I. Measuring the Sensitivity of Affinity 
Separation 

J. Measuring the Efficiency of Separation 

K. Reducing Selection due to Non-Specific Binding 

L. Isolation of Genetic Package PBDs with 

Binding- to -Target Pheno types 
M. Recovery of Packages 
N. Amplifying the Enriched Packages 
0. Determining Whether Further Enrichment is 

Needed 

P. Characterizing the Putative SBDs 

Q. Joint Selections 

R. Selection for Non-Binding 

S. Selection of Potential Binding Domains for 
Retention of Structure 
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T. Engineering of Antagonists 

VI. EXPLOITATION OF SUCCESSFUL BINDING DOMAINS AND 
CORRESPONDING DNAS 

A. Generally 

B. Production of Novel Binding Proteins 

C. Mini -Protein Production 

D. Uses of Novel Binding Proteins 

VII. EXAMPLES 

I. DEFINITIONS AND ABBREVIATIONS 

Let Kd (x,y) be a dissociation constant, 
[x] [y] 



K d (x /Y ) =• 



[x:y] 



For the purposes of the appended claims, a protein P is 
a binding protein if (1) For one molecular, ionic or 
atomic species A, other than the variable domain of an 
antibody, the dissociation constant K D (P,A) < 10~ 6 
moles/liter (preferably, < 1CT 7 moles/liter), and (2) for 
a different molecular, ionic, or atomic species B, K D 
(P,B) > 10" 4 moles/liter (preferably, > 10" 1 moles/liter) . 
As a result of these two conditions, the protein P 
exhibits specificity for A over B, and a minimum degree 
of affinity (or avidity) for A. 

The exclusion of "variable domain of an antibody 11 
in (1) above is intended to make clear that for the 
purposes herein a protein is not to be considered a 
"binding protein" merely because it is antigenic. 
However, an antigen may nonetheless qualify as a binding 
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ii- sr>ecif ically binds to a substance 
protein because it specuicdny 

other than ah antibody, e^, an enzyme for its 
substrate, or a hormone for its cellular receptor. 
A dditionally, it should be pointed out that "binding 
protein" may include a protein which binds specifically 
to the Fc of an antibody, e^. staphylococcal protein 

Normally, the binding protein will not be an 
antibody or a antigen-binding derivative thereof. An 
antibody is a crosslink complex of four polypeptides 
(two heavy and two light chains, . The light chains of 
loo have a molecular weight of -23,000 daltons and the 
beavy chains of -53,000 daltons. A single binding unit 
is composed of the variable region of a heavy chain <V„ 
and the variable region of a light chain (v.). each about 
110 amino-acid residues. The V„ and V. regions are held 
in proximity by a disulfide bond between the adjoining C* 
and C H1 regions, altogether, these total 440 residues and 
correspond to an Fab fragment. Derivatives o 

antibodies include Fab fragments and the individual 
variable light and heavy domains. A special case of 
antibody derivative is a "single chain antibody." A 
■■single-chain antibody" is a single chain polypeptide 
comprising at least 200 amino acids, said amino acids 
forming two antigen-binding regions connected by a 
peptide linker that allows the two regions to fold 
together to bind the antigen in a manner akin to that of 
an Fab fragment. Either the two antigen-binding regions 



must be variable domains of known antibodies, or they 
must (1) each fold into a £ barrel of nine strands that 
are spatially related in the same way as are the nine 
strands of known antibody variable light or heavy 
domains, and (2) fit together in the same way as do the 
variable domains of said known antibody. Generally 
speaking, this will require that, with the exception of 
the amino acids corresponding to the hypervariable 
region, there is at least 88% homology with the amino 
acids of the variable domain of a known antibody. 

While the present invention may be used to develop 
novel antibodies through variegation of codons 
corresponding to the hypervariable region of an 
antibody ! s variable domain, its primary utility resides 
in the development of binding proteins which are not 
antibodies or even variable domains of antibodies. 
Novel antibodies can be obtained by immunological 
techniques ; novel enzymes , hormones , etc . cannot . 

It will be appreciated that, as a result of 
evolution, the antigen-binding domains of antibodies 
have acquired a structure which tolerates great 
variability of sequence in the hypervariable regions. 
The remainder of the variable domain is made up of 
constant regions forming a distinctive structure, a nine 
strand £ barrel , which hold the hypervariable regions 
(inter-strand loops) in a fixed relationship with each 
other. Most other binding proteins lack this molecular 
design which facilitates diversification of binding 
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characteristics. Consequently, the successful 

development of novel antibodies by modification of 
sequences encoding known hypervariable regions- -which, 
in nature, vary from antibody to antibody- -does not 
provide any guidance or assurance of success in the 
de ve 1 opme n t of no ve 1 , non - i mmunog 1 obu lin bindi ng 
proteins . 

It should further be noted that the affinity of 
antibodies for their target epitopes is typically on the 
order of 10 6 to 10 10 liters/mole; many enzymes exhibit 
much greater affinities (10 9 to 10 15 liters/mole) for 
their preferred substrates. Thus, if the goal is to 
develop a binding protein with a very high affinity for 
a target of interest, e.g. , greater than 10 10 , the 
antibody design may in fact be unduly limiting. 
Furthermore, the complementarity-determining residues of 
an antibody comprises many residues, 30 to 50. In most 
cases, it is not known which of these residues 
participates directly in binding antigen. Thus, picking 
an antibody as PPBD does not allow us to focus 
variegation to a small number of residues. 

Most larger proteins fold into distinguishable 
globules called domains (ROSS81) . Protein domains have 
been defined various ways, but all definitions fall into 
one of three classes: a) those that define a domain in 
terms of 3D atomic coordinates, b) those that define a 
domain as an isolable, stable fragment of a larger 
protein, and c) those that define a domain based on 
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protein sequence homology plus a method from class a) or 
b) . Frequently, different "methods of defining domains 
applied to a single protein yield identical or very 
similar domain boundaries. The diversity of definitions 
for domains stems from the many ways that protein 
domains are perceived to be important , including the 
concept of domains in predicting the boundaries of 
stable fragments, and the relationship of domains to 
protein folding, function, stability, and evolution. 
The present invention emphasizes the retention of the 
structured character of a domain even though its surface 
residues are mutated. Consequently, definitions of 
"domain" which emphasize stability -- retention of the 
overall structure in the face of perturbing forces such 
as elevated temperatures or chaotropic agents are 
favored, though atomic coordinates and protein sequence 
homology are not completely ignored. 

When a domain of a protein is primarily responsible 
for the protein's ability to specifically bind a chosen 
target, it is referred to herein as a "binding domain" 
(BD) . A preliminary operation is to engineer the 
appearance of a stable protein domain, denoted as an 
"initial potential binding domain" (IPBD) , on the 
surface of a genetic package. 

The term "variegated DNA" (vgDNA) refers to a 
mixture of DNA molecules of the same or similar length 
which, when aligned, vary at some codons so as to encode 
at each such codon a plurality of different amino acids, 
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but which encode only a single amino acid at other codon 
positions . It is further understood that in variegated 
DNA, the codons which are variable, and the range and 
frequency of occurrence of the different amino acids 
which a given variable codon encodes , are determined in 
advance by the synthesizer of the DNA, even though the 
synthetic method does not allow one to know, a priori, 
the sequence of any individual DNA molecule in the 
mixture. The number of designated variable codons in 
the variegated DNA is preferably no more than 20 codons, 
and more preferably no more than 5-10 codons. The mix 
of amino acids encoded at each variable codon may differ 
from codon to codon. 

A population of genetic packages into which 
variegated DNA has been introduced is likewise said to 
be "variegated 11 . 

For the purposes of this invention, the term 
"potential binding protein" refers to a protein encoded 
by one species of DNA molecule in a population of 
variegated DNA wherein the region of variation appears 
in one or more subsequences encoding one or more 
segments of the polypeptide having the potential of 
serving as a binding domain for the target substance. 

From time to time, it may be helpful to speak of 
the "parent sequence" of the variegated DNA. When the 
novel binding domain sought is an analogue of a known 
binding domain, the parent sequence is the sequence that 
encodes the known binding domain . The variegated DNA 
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will be identical with this parent sequence at one or 
more loci, but will diverge from it at chosen loci. 
When a potential binding domain is designed from first 
principles, the parent sequence is a sequence which 
encodes the amino acid sequence that has been predicted 
to form the desired binding domain, and the variegated 
DNA is a population of "daughter DNAs 11 that are related 
to that parent by a recognizable sequence similarity. 

A "chimeric protein" is a protein composed of a 
first amino acid sequence substantially corresponding to 
the sequence of a protein or to a large fragment of a 
protein (20 or more residues) expressed by the species 
in which the chimeric protein is expressed and a second 
amino acid sequence that does not substantially 
correspond to an amino acid sequence of a protein 
expressed by the first species but that does 
substantially correspond to the sequence of a protein 
expressed by a second and different species of organism. 
The second sequence is said to be foreign to the first 
sequence . 

One amino acid sequence of the chimeric proteins of 
the present invention is typically derived from an outer 
surface protein of a "genetic package " as hereafter 
defined. The second amino acid sequence is one which, 
if expressed alone, would have the characteristics of a 
protein (or a domain thereof) but is incorporated into 
the chimeric protein as a recognizable domain thereof; 
It may appear at the amino or carboxy terminal of the 



first amino acid sequence (with or without an 
intervening spacer) , or it may interrupt the first amino 
acid sequence. The first amino acid sequence may- 
correspond exactly to a surface protein of the genetic 
package , or it may be modified, e.g. , to facilitate the 
display of the binding domain. 

In the present invention, the words "select" and 
"selection" are used in the genetic sense; i.e. a 
biological process whereby a phenotypic characteristic 
is used to enrich a population for those organisms 
displaying the desired phenotype . 

One affinity separation is called a "separation 
cycle"; one pass of variegation followed by as many 
separation cycles as are needed to isolate an SBD, is 
called a "variegation cycle". The amino acid sequence 
of one SBD from one round becomes the PPBD to the next 
variegation cycle. We perform variegation cycles 

iteratively until the desired affinity and specificity 
of binding between an SBD and chosen target are 
achieved . 

The following abbreviations will be used throughout 
the present specification: 

Abbreviation Meaning 



GP Genetic Package, e.g. a 

bacteriophage 
wtGP Wild-type GP 

X Any protein 

x The gene for protein X 
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BD 
BPTI 



IPBD 



PBD 



SBD 



PPBD 



OSP 



OS P- PBD 



OSTS 



GP(x) 



Binding Domain 
Bovine pancreatic trypsin 
inhibitor, identical to 
aprotinin (Merck Index, 
entry 784, p.H9(SEQ ID 
NO: 44)) 

Initial Potential Binding 
Domain, e.g. BPTI 
Potential Binding Domain, 
e.g. a derivative of BPTI 
Successful Binding Domain, 
e.g. a derivative of BPTI 
selected for binding to a 
target 

Parental Potential Binding 
Domain, i.e. an IPBD or an 
SBD from a previous 
selection 

Outer Surface Protein, 
e.g. coat protein of a 
phage or LamB from coli 
Fusion of an OSP and a 
PBD, order of fusion not 
specified 

Outer Surface Transport 
Signal 

A genetic package 

containing the x gene 
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GP(X) A genetic package that 

displays X on its outer 
surface 

GP ( osp-pbd ) GP containing an osp-pbd 

gene 

GP (OSP-PBD) A genetic package that 

displays PBD on its 
outside as a fusion to OSP 

GP ( pbd ) GP containing a pbd gene, 

osp implicit 

GP (PBD) A genetic package 

displaying PBD on its 
outside, OSP unspecified 

{q} An affinity matrix 

supporting "Q", e.g. {T4 
lysozyme} is T4 lysozyme 
attached to an affinity 
matrix 

AfM(W) A molecule having affinity 

for "W", e.g. trypsin is 
an AfM(BPTI) 

AfM(W)* AfM(W) carrying a label, 

^ ^ 125 T 

e.g. l 

XINDUCE A chemical that can induce 

expression of a gene, e.g. 
IPTG for the lacUVS 
promoter 

OCV Operative Cloning Vector 
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Kd 



Kn 

DoAMoM 

mfaa 
lfaa 
Abun (x) 

OMP 
nt 

SP-I 



Mi 



DNA 



*P1 

Lef f 
M n tv 



A bimolecular dissociation 
constant, Kd 
[A] [B]/[A:B] 

K T = [T] [SBD] / [T : SBD] (T 
is a target) 

K N = [N] [SBD] / [N:SBD] (N 

is a non-target) 

Density of Af M (W) on 

affinity matrix 

Most -Favored amino acid 

Least -Favored amino acid 

Abundance of DNA molecules 

encoding amino acid x 

Outer membrane protein 

nucleotide 

Signal -sequence Peptidase 
I 

Yield of ssDNA up to Q 
bases long 

Maximum length of ssDNA 

that can be synthesized in 

acceptable yield 

Yield of plasmid DNA per 

volume of culture 

DNA ligation efficiency 

Maximum number of 

transf ormants produced 

from Y D100 DNA of Insert 
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C e ff 



Csensi 



N c hrom 



S err 



Efficiency of 
chromatographic 
enrichment, enrichment per 
pass 

Sensitivity of 
chromatographic 
separation, can find 1 in 
N, 

Maximum number of 

enrichment cycles per 
variegation cycle 
Error level in 

synthesizing vgDNA 
in- frame genetic fusion or 
protein produced from in- 
frame fused gene 



Single- letter codes for amino acids and nucleotides are 
given in Table 1 . 

*** 

II . THE INITIAL POTENTIAL BINDING DOMAIN (IPBD) : 

II .A. Generally 

The initial potential binding domain may be: 1) a 
domain of a naturally occurring protein, 2) a non- 
naturally occurring domain which substantially 

corresponds in sequence to a naturally occurring domain, 
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but which differs from it in sequence by one or more 
substitutions , insertions or deletions, 3) a domain 
substantially corresponding in sequence to a hybrid of 
subsequences of two or more naturally occurring 
proteins, or 4) an artificial domain designed entirely 
on theoretical grounds based on knowledge of amino acid 
geometries and statistical evidence of secondary 
structure preferences of amino acids. (However, the 
limitations of a priori protein design prompted the 
present invention.) Usually, the domain will be a known 
binding domain, or at least a homologue thereof, but it 
may be derived from a protein which, while not 
possessing a known binding activity, possesses a 
secondary or higher structure that lends itself to 
binding activity (clefts, grooves, etc . ) . The protein 
to which the IPBD is related need not have any specific 
affinity for the target material . 

In determining whether sequences should be deemed 
to "substantially correspond", one should consider the 
following issues : the degree of sequence similarity 
when the sequences are aligned for best fit according to 
standard algorithms, the similarity in the connectivity 
patterns of any crosslinks ( e.g. , disulfide bonds) , the 
degree to which the proteins have similar three- 
dimensional structures, as indicated by, e.g. , X-ray 
diffraction analysis or NMR, and the degree to which the 
sequenced proteins have similar biological activity. In 
this context, it should be noted that among the serine 
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protease inhibitors, there are families of proteins 
recognized to be homologous in which there are pairs of 
members with as little as 30% sequence homology. 

A candidate IPBD should meet the following 
criteria : 

1) a domain exists that will remain stable under the 
conditions of its intended use (the domain may 
comprise the entire protein that will be inserted, 
e.g. BPTI (SEQ ID NO:44), a-conotoxin GI , or CMTI- 
III) , 

2) knowledge of the amino acid sequence is obtainable, 
and 

3) a molecule is obtainable having specific and high 
affinity for the IPBD, Af M (IPBD) . 

Preferably, in order to guide the variegation strategy, 
knowledge of the identity of the residues on the 
domain's outer surface, and their spatial relationships, 
is obtainable; however, this consideration is less 
important if the binding domain is small, e.g. , under 40 
residues . 

Preferably, the IPBD is no larger than necessary 
because small SBDs (for example, less than 30 amino 
acids) can be chemically synthesized and because it is 
easier to arrange restriction sites in smaller amino- 
acid sequences. For PBDs smaller than about 40 

residues, an added advantage is that the entire 
variegated pbd gene can be synthesized in one piece. In 
that case, we need arrange only suitable restriction 
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a cmaller protein minimizes the 
sites in the os£ gene. A smaller pr 

metabolic strain on the GP or the host of the OP. The 

IPBD is preferably smaller than about 200 residues. The 

i ho 1arae enough to have acceptable 
IPBD must also be large enwuy 

binding affinity and specificity. For an IPBD lacking 
covalent crosslinks, such as disulfide bonds, the IPBD 
is preferably at least 40- residues, . it- may be_as small 
as six residues if it contains a crosslink. These 
small, crosslinked IPBDs, known as -mini-proteins", are 
discussed in more detail later in this section. 

Some candidate IPBDs, which meet the conditions set 
forth above, will be more suitable than others, 
mformation about candidate IPBDs that will be used to 
judge the suitability of the IPBD includes: 1) a 
structure (knowledge strongly preferred) , 2, one or more 
sequences homologous to the IPBD (the more homologous 
sequences known, the better), 3) the pi of the IPBD 
(knowledge desirable when target is highly charged) , 4, 
th e stability and solubility as a function of 
temperature, pH and ionic strength (preferably known to 
be stable over a wide range and soluble in conditions of 
intended use, , 5) ability to bind metal ions such as Ca 
or Mg " (knowledge preferred; binding per se, no 
preference), 6) enzymatic activities, if any (knowledge 
preferred, activity per se has uses but may cause 
problems), 7, binding properties, if any (knowledge 
preferred, specific binding also preferred,, 
availability of a molecule having specific and strong 
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affinity (K* < 10" 11 M) for the IPBD (preferred) , 9) 
availability of a molecule having specific and medium 
affinity (10~ 8 , M < Kd < 10" 6 M) for the IPBD (preferred) , 
10) the sequence of a mutant of IPBD that does not bind 
to the affinity molecule (s) (preferred), and 11) 
absorption spectrum in visible, UV, NMR, etc . 
(characteristic absorption preferred) . 

If only one species of molecule having affinity for 
IPBD (AfM (IPBD) ) is available, it will be used to: a) 
detect the IPBD on the GP surface, b) optimize 
expression level and density of the affinity molecule on 
the matrix, and c) determine the efficiency and 
sensitivity of the affinity separation. As noted above, 
however, one would prefer to have available two species 
of AfM (IPBD) , one with high and one with moderate 
affinity for the IPBD. The species with high affinity 
would be used in initial detection and in determining 
efficiency and sensitivity, and the species with 
moderate affinity would be used in optimization. 

If the IPBD is not itself a binding domain of a 
known binding protein, or if its native target has not 
been purified, an antibody raised against the IPBD may 
be used as the affinity molecule. Use of an antibody 
for this purpose should not be taken to mean that the 
antibody is the ultimate target. 

There are many candidate IPBDs for which all of the 
above information is available or is reasonably 
practical to obtain, for example, bovine pancreatic 
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mpTI 58 residues), CMTI-HI (29 

trypsin inhibitor (BPTI , 58 ^ 

v.-:^ (df> residues) , tnuu 

residues) , crambxn (46 enterot oxin (ST-Ia 

ovomucoid (56 resxdues) , heat residue s) , 

, tp coli) (18 residues) , a-Conotoxin GI (13 r 

- ^ Li (22 residues) , Conus King Kong mxnx- 
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• f,7 residues), T4 lysozyme (164 

chemical cross linkxng the oretical 

«f related proteins, or rrom 
structures of reiaceu f Gained by X- 

™ structural information obtaxnea y 
calculations. 3D struct preferred 

diffraction, neutron diffraction or NMR is p 
ray dxffractx localiza tion of almost all 

because these methods allow 
of the atoms to within defined Ixmx ^ 
several preferred IPBDs. Works rela ed to « 
of 3D structure of small protexns - 
CHAZ85 , PEAS90 , PEAS 8 8 , CLOR86, C L 0R87a, HEIT89, 
WAGN7 and PARDB. ^ ^ for 

In some cases, * though SO me 

the target may be a preferred ^ ^ 

other criteria are not optxmally met. 

• rD4 is a good choice as IPBD tor a y 

vl domaxn of CD4 9 ^ ^ 

in the regxon 42 to ^ 

^ fhat other mutations exther nav« 
binding and that otne „^, ire of VI. 

e££ect or «^ ; ; F r:r; b e . ^ 

Similarly, tumor necrosxs factor (T 
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initial choice if one wants a TNF-like molecule having 
higher affinity for the TNF receptor. 

Membrane -bound proteins are not preferred IPBPs, 
though they may serve as a source of outer surface 
transport signals. One should distinguish between 

membrane -bound proteins, such as LamB or OmpF, that 
cross the membrane several times forming a structure 
that is embedded in the lipid bilayer and in which the 
exposed regions are the loops that join trans -membrane 
segments, from non-embedded proteins, such as the 
soluble domains of CD4 , that are simply anchored to the 
membrane. This is an important distinction because it 
is quite difficult to create a soluble derivative of a 
membrane -bound protein. Soluble binding proteins are in 
general more useful since purification is simpler and 
they are more tractable and more versatile assay 
reagents . 

Most of the PBDs derived from a PPBD according to 
the process of the present invention will have been 
derived by variegation at residues having side groups 
directed toward the solvent. Reidhaar-Olson and Sauer 
(REID88a) found that exposed residues can accept a wide 
range of amino acids, while buried residues are more 
limited in this regard. Surface mutations typically 
have only small effects on melting temperature of the 
PBD, but may reduce the stability of the PBD . Hence the 
chosen IPBD should have a high melting temperature (50°C 
acceptable, the higher the better; BPTI melts at 95 °C.) 
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» wide PH range (8.0 to 3.0 

j K _ ot-able over a wiae = 
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• j- ,mi retain sufficient stability, 
through- binding will retail , hP 
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Preferably, the substitutions in the . q£ the 
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proteins containing 
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n oo C to vield derivatives that will 
lite ly than other --s to y e ^ ^ ^ 
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GP(IPBD , for a particular target,- crrterra are grve 



below that relate target size and charge to the choice 
of IPBD. 

II . B . Influence of target size on choice of IPBD: 

If the target is a protein or other macromolecule a 
preferred embodiment of the IPBD is a small protein such 
as the Cucurbit a maxima trypsin inhibitor III (29 
residues) , BPTI from Bos Taurus (58 residues) , crambin 
from rape seed (46 residues) , or the third domain of 
ovomucoid from Coturnix coturnix Japonica (Japanese 
quail) (56 residues) , because targets from this class 
have clefts and grooves that can accommodate small 
proteins in highly specific ways. If the target is a 
macromolecule lacking a compact structure, such as 
starch, it should be treated as if it were a small 
molecule. Extended macromolecules with defined. 3D 

structure, such as collagen, should be treated as large 
molecules . 

If the target is a small molecule, such as a 
steroid, a preferred embodiment of the IPBD is a protein 
of about 80-200 residues, such as ribonuclease from Bos 
taurus (124 residues) , ribonuclease from Aspergillus 
oruzae (104 residues) , hen egg white lysozyme from 
Gallus gallus (129 residues) , azurin from Pseudomonas 
aerugenosa (128 residues) , or T4 lysozyme (164 
residues) , because such proteins have clefts and grooves 
into which the small target molecules can fit. The 
Brookhaven Protein Data Bank contains 3D structures for 
all of the proteins listed. Genes encoding proteins as 
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large as T4 lysozyme can be manipulated by standard 
techniques for the purposes of this invention. 

If the target is a mineral, insoluble in water, one 
considers the nature of the molecular surface of the 
mineral. Minerals that have smooth surfaces, such as 
crystalline silicon, are best addressed with medium to 
large proteins, such as ribonuclease, as IPBD in order 
to have sufficient contact area and specificity. 
Minerals with rough, grooved surfaces, such as zeolites, 
could be bound either by small proteins, such as BPTI , 
or larger proteins, such as T4 lysozyme. 

II . C . Influence of target charge on choice of IPBD: 

Electrostatic repulsion between molecules of like 
charge can prevent molecules with highly complementary 
surfaces from binding. Therefore, it is preferred that, 
under the conditions of intended use, the IPBD and the 
target molecule either have opposite charge or that one 
of them is neutral. In some cases it has been observed 
that protein molecules bind in such a way that like 
charged groups are juxtaposed by including oppositely 
charged counter ions in the molecular interface. Thus, 
inclusion of counter ions can reduce or eliminate 
electrostatic repulsion and the user may elect to 
include ions in the eluants used in the affinity 
separation step. Polyvalent ions are more effective at 
reducing repulsion than monovalent ions. 
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II .D. Other considerations in the choice of IPBD : 

If the chosen IPBD is an enzyme, it may be 
necessary to change one or more residues in the active 
site to inactivate enzyme function. For example, if the 
IPBD were T4 lysozyme and the GP were coli cells or 

M13, we would need to inactivate the lysozyme because 
otherwise it would lyse the cells. If, on the other 
hand, the GP were 3>X174, then inactivation of lysozyme 
may not be needed because T4 lysozyme can be 
overproduced inside coli cells without detrimental 

effects and <£>X174 forms intracellularly . It is 

preferred to inactivate enzyme IPBDs that might be 
harmful to the GP or its host by substituting mutant 
amino acids at one or more residues of the active site. 
It is permitted to vary one or more of the residues that 
were changed to abolish the original enzymatic activity 
of the IPBD. Those GPs that receive osp-pbd genes 
encoding an active enzyme may die, but the majority of 
sequences will not be deleterious. 

If the binding protein is intended for therapeutic 
use in humans or animals, the IPBD may be chosen from 
proteins native to the designated recipient to minimize 
the possibility of antigenic reactions. 

II .E. Bovine Pancreatic Trypsin Inhibitor (BPTI) as 

an IPBD: 

BPTI is an especially preferred IPBD because it 
meets or exceeds all the criteria: it is a small, very 
stable protein with a well known 3D structure. Marks et 
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al . (MARK8 6) have shown that a fusion of the phoA signal 
peptide gene fragment and DNA coding for the mature form 
of BPTI caused native BPTI to appear in the periplasm of 
E . coli, demonstrating that there is nothing in the 
structure of BPTI to prevent its being secreted. 

The structure of BPTI is maintained even when one 
or another of the disulfides is removed, either by 
chemical blocking or by genetic alteration of the amino- 
acid sequence. The stabilizing influence of the 

disulfides in BPTI is not equally distributed. 
Goldenberg (GOLD85) reports that blocking CYS14 and 
CYS3 8 lowers the Tm of BPTI to «75°C while chemical 
blocking of either of the other disulfides lowers Tm to 
below 40 °C. Chemically blocking a disulfide may lower 
Tm more than mutating the cysteines to other amino- acid 
types because the bulky blocking groups are more 
destabilizing than removal of the disulfide. Marks et 
al . (MARK87) replaced both CYS14 and CYS3 8 with either 
two alanines or two threonines. The CYS14/CYS38 cystine 
bridge that Marks et al . removed is the one very close 
to the scissile bond in BPTI; surprisingly, both mutant 
molecules functioned as trypsin inhibitors. Schnabel et 
al . (SCHN86) report preparation of aprotinin (C14A, C38A) 
by use of Raney nickel. Eigenbrot et al . (EIGE90) 
report the X-ray structure of BPTI (C30A/C51A) which is 
stable to at least 5 0 °C. The backbone of this mutant is 
as similar to BPTI as are the backbones of BPTI 
molecules that sit in different crystal lattices . This 
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indicates that BPTI is redundantly stable and so is 
likely to fold into approximately the same structure 
despite numerous surface mutations. Using the knowledge 
of homologues, vide infra , we can infer which residues 
should not be varied if the basic BPTI structure is to 
be maintained. 

The 3D structure of BPTI has been determined at 
high resolution by X-ray diffraction (HUBE77, MARQ83 , 
WLOD84, WLOD87a, WLOD87b) , neutron diffraction (WLOD84), 
and by NMR (WAGN87) . In one of the X-ray structures 
deposited in the Brookhaven Protein Data Bank, entry 
6PTI, there was no electron density for A58, indicating 
that A58 has no uniquely defined conformation. Thus we 
know that the carboxy group does not make any essential 
interaction in the folded structure. The amino terminus 
of BPTI is very near to the carboxy terminus. 
Goldenberg and Creighton reported on circularized BPTI 
and circularly permuted BPTI (GOLD83) . Some proteins 
homologous to BPTI have more or fewer residues at either 
terminus . 

BPTI has been called "the hydrogen atom of protein 
folding" and has been the subject of numerous 
experimental and theoretical studies (STAT87 , SCHW87, 
GOLD83, CHAZ83, CREI74, CREI77a, CREI77b, CREI80, 
SIEK87, SINH90, RUEH73, HUBE74 , HUBE75, HUBE77 and 
others) . 

BPTI has the added advantage that at least 5 9 
homologous proteins are known. Table 13 shows the 
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sequences of 39 homologues. A tally of ionizable groups 
in 59 homologues is shown in Table 14 and the composite 
of amino acid types occurring at each residue is shown 
in Table 15 . 

BPTI is freely soluble and is not known to bind 
metal ions. BPTI has no known enzymatic activity. BPTI 
is not toxic. 

All of the conserved residues are buried; of the 
six fully conserved residues only G37 has noticeable 
exposure. The solvent accessibility of each residue in 
BPTI is given in Table 16 which was calculated from the 
entry 11 6PTI" in the Brookhaven Protein Data Bank with a 
solvent radius of 1.4 A, the atomic radii given in Table 
7, and the method of Lee and Richards (LEEB71) . Each of 
the 52 non- conserved residues can accommodate two or 
more kinds of amino acids. By independently 

substituting at each residue only those amino acids 
already observed at that residue, we could obtain 
approximately 1.6-10 43 different amino acid sequences, 
most of which will fold into structures very similar to 
BPTI . 

BPTI will be especially useful as a IPBD for 
macromolecular targets. BPTI and BPTI homologues bind 
tightly and with high specificity to a number of enzyme 
macromolecules . 

BPTI is strongly positively charged except at very 
high pH, thus BPTI is useful as IPBD for targets that 
are not also strongly positive under the conditions of 



60 



intended use . There exist homologues of BPTI , however, 
having quite different charges ( viz . SCI -III from Bombyx 
mori at -7 and the trypsin inhibitor from bovine 
colostrum at -1) . Once a genetic package is found that 
displays BPTI on its surface, the sequence of the BPTI 
domain can be replaced by one of the homologous 
sequences to produce acidic or neutral IPBDs. 

BPTI is quite small; if this should cause a 
pharmacological problem, two or more BPTI -derived 
domains may be j oined as in humans BPTI homologues , one 
of which has two domains (BALD85, ALBR83b) and another 
has three (WUNT8 8) . 

Another possible pharmacological problem is immun 
igenicity. BPTI has been used in humans with very few 
adverse effects . Siekmann et al . (SIEK89) have studied 
immunological characteristics of BPTI and some 
homologues. It is an advantage of the method of the 
present invention that a variety of SBDs can be obtained 
so that, if one derivative proves to be antigenic, a 
different SBD may be used. Furthermore, one can reduce 
the probability of immune response by starting with a 
human protein, such as LACI (a BPTI homologue) (WUNT88, 
GIRA89) or Inter-Qf-Trypsin Inhibitor (ALBR83a, ALBR83b, 
DIAR90, ENGH89, TRIB86, GEBH86, GEBH90, KAUM86, ODOM90, 
SALI90) . 

Further, a BPTI -derived gene fragment, coding for a 
novel binding domain, could- be fused in- frame to a gene 
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fragment coding for other proteins, such as serum 
albumin or the constant parts of IgG. 

Tschesche et al . (TSCH87) reported on the binding 
of several BPTI derivatives to various proteases: 

Dissociation constants for BPTI derivatives, Molar. 

Residue Trypsin Chymotrypsin Elastase Elastase 

#15 ( bovine ( bovine ( pore ine ( human 

pancreas) ~ pancr eas) pancreas) "leukocytes) 
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lysine 6.0-10" 14 9.0-10" 9 " 3.5-10" 

glycine - " + 7.0-10 

alanine + " 2.8-10" 8 2.5-10" 9 

valine - " 5.7-10" 8 l.l-KT 10 

leucine - * 1.9-10" 8 2.9-10" 9 



From the report of Tschesche et al . we infer that 
molecular pairs marked » + " have K^s > 3.5-10" 6 -M and that 
molecular pairs marked " - " have KdS >> 3.5-10" 6 M. 
Because of the wealth of data about the binding of BPTI 
and various mutants to trypsin and other proteases 
(TSCH87) , we can proceed in various ways in optimizing 
the affinity separation conditions. (For other PBDs, we 
can obtain two different monoclonal antibodies, one with 
a high affinity having Ka of order 10" 11 M, and one with a 
moderate affinity having Kd on the order of 10" 6 M.) 

Works concerning BPTI and its homologues include: 
KID088, PONT88, KID090, AUER87 , AUER90 , SCOT87b / AUER8 8 , 
AUER8 9 , BECK8 8b , WACH7 9, WACH8 0, BECK8 9a, DUFT8 5 , 
FIOR88, GIRA89, GOLD84, GOLD88, HOCH84 , RIT083 , NORR8 9 a , 
NORR8 9b, OLTE8 9, SWAI8 8, and WAGN7 9. 
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II. F Mini-Proteins as IPBDs: 

A polypeptide is a polymer composed of a single 
chain of the same or different amino acids joined by 
peptide bonds. Linear peptides can take up a very large 
number of different conformations through internal 
rotations about the main chain single bonds of each a 
carbon. These rotations are hindered to varying degrees 
by side groups, with glycine interfering the least, and 
valine, isoleucine and, especially, proline, the most. 
A polypeptide of 2 0 residues may have 10 20 different 
conformations which it may assume by various internal 
rotations . 

Proteins are polypeptides which, as a result of 
stabilizing interactions between amino acids that are 
not in adjacent positions in the chain, have folded into 
a well-defined conformation. This folding is usually 
essential to their biological activity. 

For polypeptides of 40-60 residues or longer, 
noncovalent forces such as hydrogen bonds, salt bridges, 
and hydrophobic "interactions" are sufficient to 
stabilize a particular folding or conformation. The 
polypeptide 1 s constituent segments are held to more or 
less that conformation unless it is perturbed by a 
denaturant such as rising temperature or decreasing pH, 
whereupon the polypeptide v unfolds or "melts". The 
smaller the peptide, the more likely it is that its 
conformation will be determined by the environment. If 
a small unconstrained peptide has biological activity, 
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the peptide ligand will be in essence a random coil 
until it comes into proximity with its receptor. The 
receptor accepts the peptide only in one or a few 
conformations because alternative conformations are 
disfavored by unfavorable van der Waals and other non- 
covalent interactions . 

Small polypeptides have potential advantages over 
larger polypeptides when used as therapeutic or 
diagnostic agents, including (but not limited to) : 

a) better penetration into tissues, 

b) faster elimination from the circulation (important 
for imaging agents) , 

c) lower antigenicity, and 

d) higher activity per mass. 

Moreover, polypeptides of under about 5 0 residues 
have the advantage of accessibility via chemical 
synthesis; polypeptides of under about 3 0 residues are 
more easily synthesized than are larger polypeptides. 
Thus, it would be desirable to be able to employ the 
combination of variegation and affinity selection to 
identify small polypeptides which bind a target of 
choice . 

Polypeptides of this size, however, have 
disadvantages as binding molecules. According to 

Olivera et. al . (OLIV90a) : "Peptides in this size range 
normally equilibrate among many conformations (in order 
to have a fixed conformation, proteins generally have to 
be much larger) . " Specific binding of a peptide to a 
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target molecule requires the peptide to take up one 
conformation that is complementary to the binding site. 
For a decapeptide with three isoenergetic conformations 
( e.g. , 6 strand, a helix, " and reverse turn) at each 
residue, there are about 6.-10 4 possible overall 
conformations. Assuming these conformations to be equi- 
probable for the unconstrained decapeptide, if only one 
of the possible conformations bound to the binding site, 
then the affinity of the peptide for the target is 
expected to be about 6-10 4 higher if it could be 
constrained to that single effective conformation. 
Thus, the unconstrained decapeptide, relative to a 
decapeptide constrained to the correct conformation, 
would be expected to exhibit lower affinity. It would 
also exhibit lower specificity, since one of the other 
conformations of the unconstrained decapeptide might be 
one which bound tightly to a material other than the 
intended target. By way of corollary, it could have 
less resistance to degradation by proteases, since it 
would be more likely to provide a binding site for the 
protease . 

In one embodiment, the present invention overcomes 
these problems, while retaining the advantages of 
smaller polypeptides, by fostering the biosynthesis of 
novel mini-proteins having the desired binding 
characteristics. Mini -Proteins are small polypeptides 
(usually less than about 60 residues) which, while too 
small to have a stable conformation as a result of 
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noncovalent forces alone, are covalently crossl inked 
( e.g. , by disulfide bonds) into a stable conformation 
and hence have biological activities more typical of 
larger protein molecules than of unconstrained 
polypeptides of comparable size. 

When mini -proteins are variegated, the residues 
which are covalently crosslinked in the parental 
molecule are left unchanged, thereby stabilizing the 
conformation. For example, in the variegation of a 
disulfide bonded mini-protein, certain cysteines are 
invariant so that under the conditions of expression and 
display, covalent crosslinks ( e.g. , disulfide bonds 
between one or more pairs of cysteines) form, and 
substantially constrain the conformation which may be 
adopted by the hypervariable linearly intermediate amino 
acids. In other words, a constraining scaffolding is 
engineered into polypeptides which are otherwise 
extensively randomized . 

Once a mini -protein of desired binding 
characteristics is characterized, it may be produced, 
not only by recombinant DNA techniques, but also by 
nonbiological synthetic methods. 

In vitro, disulfide bridges can form spontaneously 
in polypeptides as a result of air oxidation. Matters 
are more complicated in vivo . Very few intracellular 
proteins have disulfide bridges, probably because a 
strong reducing environment is , maintained by the 
glutathione system. Disulfide bridges are common in 
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proteins that travel or operate in extracellular spaces, 
such as snake venoms and other toxins ( e.g. , conotoxins, 
charybdotoxin, bacterial enterotoxins) , peptide 

hormones, digestive enzymes, complement proteins, 
immunoglobulins, lysozymes, protease inhibitors (BPTI 
and its homologues, CMTI-III ( Cucurbit a maxima trypsin 
inhibitor III) and its homologues, hirudin, etc . ) and 
milk proteins. 

Disulfide bonds that close tight intrachain loops 
have been found in pepsin, thioredoxin, insulin A-chain, 
silk fibroin, and lipoamide dehydrogenase. The bridged 
cysteine residues are separated by one to four residues 
along the polypeptide chain. Model building, X-ray 
diffraction analysis, and NMR studies have shown that 
the a carbon path of such loops is usually flat and 
rigid. 

There are two types of disulfide bridges in 
immunoglobulins. One is the conserved intrachain 

bridge, spanning about 6 0 to 7 0 amino acid residues and 
found, repeatedly, in almost every immunoglobulin 
domain. Buried deep between the opposing & sheets, 
these bridges are shielded from solvent and ordinarily 
can be reduced only in the presence of denaturing 
agents. The remaining disulfide bridges are mainly 
interchain bonds and are located on the surface of the 
molecule; they are accessible to solvent and relatively 
easily reduced (STEI85) . The disulfide bridges of the 
mini -proteins of the present invention are intrachain 
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linkages between cysteines having much smaller chain 
spacings . 

For the purpose of the appended claims, a mini- 
protein has between about eight and about sixty 
residues. However, it will be understood that a 

chimeric surface protein presenting a mini-protein as a 
domain will normally have more than sixty residues. 
Polypeptides containing intrachain disulfide bonds may 
be characterized as cyclic in nature, since a closed 
circle of covalently bonded atoms is defined by the two 
cysteines, the intermediate amino acid residues, their 
peptidyl bonds, and the disulfide bond. The terms 
"cycle", "span" and "segment" will be used to define 
certain structural features of the polypeptides . An 
intrachain disulfide bridge connecting amino acids 3 and 
8 of a 16 residue polypeptide will be said herein to 
have a cycle of 6 and a span of 4. If amino acids 4 and 
12 are also disulfide bonded, then they form a second 
cycle of 9 with a span of 7. Together, the four 
cysteines divide the polypeptide into four inter 
cysteine segments (1-2, 5-7, 9-11, and 13-16) . (Note 
that there is no segment between Cys3 and Cys4 . ) 

The connectivity pattern of a crosslinked mini- 
protein is a simple description of the relative location 
of the termini of the crosslinks. For example, for a 
mini-protein with two disulfide bonds, the connectivity 
pattern "1-3, 2-4" means that the first crosslinked 
cysteine is disulfide bonded to the third crosslinked 
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cysteine (in the primary sequence) , and the second to 
the fourth. 

The degree to which the crosslink constrains the 
conformational freedom of the mini -protein, and the 
degree to which it stabilizes the mini -protein, may be 
assessed by a number of means. These include absorption 
spectroscopy (which can reveal whether an amino acid is 
buried or exposed) , circular dichroism studies (which 
provides a general picture of the helical content of the 
protein) , nuclear magnetic resonance imaging (which 
reveals the number of nuclei in a particular chemical 
environment as well as the mobility of nuclei) , and X- 
ray or neutron diffraction analysis of protein crystals. 
The stability of the mini-protein may be ascertained by 
monitoring the changes in absorption at various 
wavelengths as a function of temperature, pH, etc . ; 
buried residues become exposed as the protein unfolds. 
Similarly, the unfolding of the mini-protein as a result 
of denaturing conditions results in changes in NMR line 
positions and widths. Circular dichroism (CD) spectra 
are extremely sensitive to conformation. 

The variegated disulf ide-bonded mini -proteins of 
the present invention fall into several classes. 

Class I mini-proteins are those featuring a single 
pair of cysteines capable of interacting to form a 
disulfide bond, said bond having a span of no more than 
nine residues. This disulfide bridge preferably has a 
span of at least two residues; this is a function of the 
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geometry of the disulfide bond. When the spacing is two 
or three residues, one residue is preferably glycine in 
order to reduce the strain on the bridged residues. The 
upper limit on spacing is less precise, however, in 
general, the greater the spacing, the less the 
constraint on conformation imposed on the linearly 
intermediate amino acid residues by the disulfide bond. 

The main chain of such a peptide has very little 
freedom, but is not stressed. The free energy released 
when the disulfide forms exceeds the free energy lost by 
the main-chain when locked into a conformation that 
brings the cysteines together. Having lost the free 
energy of disulfide formation, the proximal ends of the 
side groups are held in more or less fixed relation to 
each other. When binding to a target, the domain does 
not need to expend free energy getting into the correct 
conformation. The domain can not jump into some other 
conformation and bind a non-target. 

A disulfide bridge with a span of 4 or 5 is 
especially preferred. If the span is increased to 6, 
the constraining influence is reduced. In this case, we 
prefer that at least one of the enclosed residues be an 
amino acid that imposes restrictions on the main-chain 
geometry. Proline imposes the most restriction. Valine 
and isoleucine restrict the main chain to a lesser 
extent. The preferred position for this constraining 
non-cysteine residue is adjacent to one of the invariant 
cysteines, however, it may be one of the other bridged 
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residues. If the span is seven, we prefer to include 
two amino acids that limit main-chain conformation. 
These amino acids could be at any of the seven 
positions, but are preferably the two bridged residues 
that are immediately adjacent to the cysteines. If the 
span is eight or nine, additional constraining amino 
acids may be provided. 

The disulfide bond of a class I mini -proteins is 
exposed to solvent. Thus, one should avoid exposing the 
variegated population of GPs that display class I mini- 
proteins to reagents that rupture disulfides; Creighton 
names several such reagents (CREI88) . 

Class II mini-proteins are those featuring a single 
disulfide bond having a span of greater than nine amino 
acids. The bridged amino acids form secondary 

structures which help to stabilize their conformation. 
Preferably, these intermediate amino acids form hairpin 
super secondary structures such as those schematized 
below : 



- Cys - ahe 1 ix- turn- Ss t rand- Cys - 

l s— s 1 

- Cys -ahelix- turn- ahelix- Cys- 

l s — s 1 

- Cys - 15s t rand - turn - Ss t rand - Cys - 
Secondary structures are stabilized by hydrogen bonds 
between amide nitrogen and carbonyl groups, by interac 
tions between charged side groups and helix dipoles, and 
by van der Waals contacts. One abundant secondary 
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structure in proteins is the a-helix. The a helix has 
3.6 residues per turn, a 1.5 A rise per residue, and a 
helical radius of 2.3 A. All observed a-helices are 
right-handed. The torsion angles 0 (-57°) and \p (- 
47°) are favorable for most residues, and the hydrogen 
bond between the backbone carbonyl oxygen of each 
residue and the backbone NH of the fourth residue along 
the chain is 2.86 A long (nearly the optimal distance) 
and virtually straight. Since the hydrogen bonds all 
point in the same direction, the a helix has a 
considerable dipole moment (carboxy terminus negative) . 

The £ strand may be considered an elongated helix 
with 2.3 residues per turn, a translation of 3.3 A per 
residue, and a helical radius of 1.0 A. Alone, a £ 
strand forms no main-chain hydrogen bonds. Most 
commonly, 6 strands are found in twisted (rather than 
planar) parallel, antiparallel , or mixed 

parallel /ant iparal lei sheets . 

A peptide chain can form a sharp reverse turn. A 
reverse turn may be accomplished with as few as four 
amino acids. Reverse turns are very abundant, 

comprising a quarter of all residues in globular 
proteins. In proteins, reverse turns commonly connect S 
strands to form 6 sheets, but may also form other 
connections. A peptide can also form other turns that 
are less sharp. 

Based on studies of known proteins, one may 
calculate the propensity of a particular residue, or of 
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a particular dipeptide or tripeptide, to be found in an 
a helix, 6 strand or reverse turn. The normalized 
frequencies of occurrence of the amino acid residues in 
these secondary structures is given in Table 6-4 of 
CREI84. For a more detailed treatment on the prediction 
of secondary structure from the amino acid sequence, see 
Chapter 6 of SCHU7 9. - - - - — _ - - - 

In designing a suitable hairpin structure, one may 
copy an actual structure from a protein whose three- 
dimensional conformation is known, design the structure 
using frequency data, or combine the two approaches . 
Preferably, one or more actual structures are used as a 
model, and the frequency data is used to determine which 
mutations can be made without disrupting the structure. 

Preferably, no more than three amino acids lie 
between the cysteine and the beginning or end of the a 
helix or & strand. 

More complex structures (such as a double hairpin) 
are also possible. 

Class III mini-proteins are those featuring a 
plurality of disulfide bonds. They optionally may also 
feature secondary structures such as those discussed 
above with regard to Class II mini -proteins . Since the 
number of possible disulfide bond topologies increases 
rapidly with the number of bonds (two bonds, three 
topologies; three bonds, 15 topologies; four bonds, 105 
topologies) the number of disulfide bonds preferably 
does not exceed four. With two or more disulfide bonds, 
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the disulfide bridge spans preferably do not exceed 50, 
and the largest intercysteine chain segment preferably 
does not exceed 2 0 . 

Naturally occurring class III mini-proteins, such 
as heat-stable enterotoxin ST-Ia frequently have pairs 
of cysteines that are adjacent in the amino- acid 
sequence. Adjacent cysteines are very unlikely to form 
an intramolecular disulfide and cysteines separated by a 
single amino acids form an intramolecular disulfide with 
difficulty and only for certain intervening amino acids. 
Thus, clustering cysteines within the amino-acid 
sequence reduces the number of realizable disulfide 
bonding schemes. We utilize such clustering in the 
class III mini-protein disclosed herein. 

Metal Finger Mini-Proteins. The mini -proteins of 
the present invention are not limited to those 
crosslinked by disulfide bonds. Another important class 
of mini-proteins are analogues of finger proteins. 
Finger proteins are characterized by finger structures 
in which a metal ion is coordinated by two Cys and two 
His residues, forming a tetrahedral arrangement around 
it. The metal ion is most often zinc (II), but may be 
iron, copper, cobalt, etc . The "finger" has the 

consensus sequence (Phe or Tyr) - (1 AA) -Cys- (2-4 Ms) - 
Cys- (3 AAs) -Phe- (5 AAs) -Leu- (2 AAs) -His- (3 AAs) -His- (5 
AAs) (SEQ ID NOs:l,2,3,4,5,6) (BERG88; GIBS88) . While 
finger proteins typically contain many repeats of the 
finger motif, it is known that a single finger will fold 
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in the presence of zinc ions (FRAN87; PARR8 8) . There is 
some dispute as to whether two fingers are necessary for 
binding to DNA. The present invention encompasses mini- 
proteins with either one or two fingers. It is to be 
understood that the target need not be a nucleic acid. 
G. Modified PBSs 

There exist a number of enzymes and chemical 
reagents that can selectively modify certain side groups 
of proteins, including: a) protein- tyrosine kinase, 
Ellmans reagent, methyl transferases (that methylate GLU 
side groups) , serine kinases, proline hydroxyases, 
vitamin-K dependent enzymes that convert GLU to GLA, 
maleic anhydride, and alkylating agents. Treatment of 
the variegated population of GP(PBD)s with one of these 
enzymes or reagents will modify the side groups affected 
by the chosen enzyme or reagent . Enzymes and reagents 
that do not kill the GP are much preferred. Such 
modification of side groups can directly affect the 
binding properties of the displayed PBDs . Using 
affinity separation methods, we enrich for the modified 
GPs that bind the predetermined target. Since the 
active binding domain is not entirely geneti cally 
specified, we must repeat the post -morphogenesis 
modification at each enrichment round. This approach is 
particularly appropriate with mini-protein IPBDs because 
we envision chemical synthesis of these SBDs . 
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III. VARIEGATION STRATEGY MUTAGENESIS TO OBTAIN 

POTENTIAL BINDING DOMAINS WITH DESIRED DIVERSITY 

III .A. Generally 

Using standard genetic engineering techniques, a 
molecule of variegated DNA can be introduced into a 
vector so that it constitutes part of a gene (OLIP86, 
OLIP87, AUSU87, REID88a) . When vector containing 

variegated DNA are used to transform bacteria, each cell 
makes a version of the original protein. Each colony of 
bacteria may produce a different version from any other 
colony. If the variegations of the DNA are concentrated 
at loci known to be on the surface of the protein or in 
a loop, a population of proteins will be generated, 
many members of which will fold into roughly the same 3D 
structure as the parent protein. The specific binding 
properties of each member, however, may be different 
from each other member. 

We now consider the manner in which we generate a 
diverse population of potential binding domains in order 
to facilitate selection of a PBD-bearing GP which binds 
with the requisite affinity to the target of choice. 
The potential binding domains are first designed at the 
amino acid level. Once we have identified which 

residues are to be mutagenized, and which mutations to 
allow at those positions, we may then design the 
variegated DNA which is to encode the various PBDs so as 
to assure that there is a reasonable probability that if 
a PBD has an affinity for the target, it will be 
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detected. Of course, the number of independent 

transformants obtained and the sensitivity of the 
affinity separation technology will impose limits on the 
extent of variegation possible within any single round 
of variegation. 

There are many ways to generate diversity in a 
protein. (See RICH86, CARU85 , and OLIP86.) At one 
extreme, we vary a few residues of the protein as much 
as possible ( inter alia see CARU85, CARU8 7 , RICH86, and 
WHAR86) . We will call this approach "Focused 

Mutagenesis". A typical "Focused Mutagenesis" strategy 
is to pick a set of five to seven residues and vary each 
through 13-20 possibilities. An alternative plan of 
mutagenesis ("Diffuse Mutagenesis") is to vary many more 
residues through a more limited set of choices (See 
VERS86a and PAKU86) . The variegation pattern adopted 
may fall between these extremes, e.g. , two residues 
varied through all twenty amino acids, two more through 
only two possibilities, and a fifth into ten of the 
twenty amino acids . 

There is no fixed limit on the number of codons 
which can be mutated simultaneously. However, it is 
desirable to adopt a mutagenesis strategy which results 
in a reasonable probability that a possible PBD sequence 
is in fact displayed by at least one genetic package. 
When the size of the set of amino acids potentially 
encoded by each variable codon is the same for all 
variable codons and within the set all amino acids are 
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equiprobable, this probability may be calculated as 
follows: Let T(k,q) be the probability that amino acid 
number k will occur at variegated codon q; these codons 
need not be contiguous. The probability that a 
particular vgDNA molecule will encode a PBD containing n 
variegated amino acids ki, k n is : 
p(k ir k n ) = r(k lf l) - T(k n ,n) 

Consider a library of N it independent transf ormants 
prepared with said vgDNA; the probability that the 
sequence ki, ... ,k n is absent is: 

P (missing ki, k„) = exp{ -N it -p (ki, k n ) } . 

P(k if k n in lib) = 1 - exp{ -N it -p (k lf k n ) } 

Preferably, the probability that a mutein encoded by the 
vgDNA and composed of the least favored amino acids at 
each variegated position will be displayed by at least 
one independent trans formant in the library is at least 
0.50, and more preferably at least 0.90. (Muteins 
composed of more favored amino acids would of course be 
more likely to occur in the same library.) 

Preferably, the variegation is such as will cause a 
typical transformant population to display 10 -10 
different amino acid sequences by means of preferably 
not more than 10 -fold more (more preferably not more 
than 3-fold) different DNA sequences. 

For a mini-protein that lacks a helices and iS 
strands, one will, in any given round of mutation, 
preferably variegate each of 4-6 non-cysteine codons so 
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that they each encode at least eight of the 2 0 possible 
amino acids. The variegation at each codon could be 
customized to that position. Preferably, cysteine is 
not one of the potential substitutions, though it is not 
excluded . 

When the mini-protein is a metal finger protein, in 
a typical variegation strategy, the two Cys and two His 
residues, and optionally also the aforementioned 
Phe/Tyr , Phe and Leu residues, are held invariant and a 
plurality (usually 5-10) of the other residues are 
varied. 

When the mini -protein is of the type featuring one 
or more or helices and 6 strands, the set of potential 
amino acid modifications at any given position is picked 
to favor those which are less likely to disrupt the 
secondary structure at that position. Since the number 
of possibilities at each variable amino acid is more 
limited, the total number of variable amino acids may be 
greater without altering the sampling efficiency of the 
selection process. 

For the last -mentioned class of mini-proteins, as 
well as domains other than mini -proteins , preferably not 
more than 20 and more preferably 5-10 codons will be 
variegated. However, if diffuse mutagenesis is 

employed, the number of codons which are variegated can 
be higher. 
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The decision as to which residues to modify is 
eased by knowledge o£ which residues lie on the surface 
of the domain and which are buried in the interror. 

we choose residues in the IPBD to vary through 
consideration of several factors, including: a) the 3D 
structure of the IPED, b> sequences homologous to IPBD, 
and" c) modeling of the IPBD and -mutant*, of. the IPBD. 
when th e number of residues that could strongly 
influence binding is greater than the number that should 
he varied simultaneously, the user should pick a subset 
of those residues to vary at one time. The user picks 
trial levels of variegation and calculate the abundances 
of various sequences. The list of varied residues and 
the level of variegation at each varied residue are 
adjusted until the composite variegation is commensurate 
with the sensitivity of the affinity separation and the 
number of independent transf ormants that can be made. 

Preferably, the abundance of PPBD-encoding DNA rs 
to 10 times higher than both !/««„ and 1/CW to provide 

m is the number of 
a margin of redundancy. M„„ tn 

transformants that can be made from DNA. With 

current technology Mntv is approximately 5- 10°, but the 
exact value depends on the details of the procedures 
adapted by the user. Improvements in technology that 
ailow more efficient, a, synthesis of DNA, b, ligation 
of DNA, or c) transformation of cells will raise the 
value of M„„. C— is the sensitivity of the affinity 
separation; improvements in affinity separation «11 
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raise C se nsi - If the smaller of M ntv and Csensi is 
increased, higher levels of variegation may be used. 
For example, if C sen si is 1 in 10 9 and M ntv is 10 8 , then 
improvements in C sen si are less valuable than improvements 
in M ntv . 

While variegation normally will involve the 
substitution of one amino acid for another at a 
designated variable codon, it may involve the insertion 
or deletion of amino acids as well. 

III..B . Identification of Residues to be Varied 

We now consider the principles that guide our 
choice of residues of the IPBD to vary. A key concept 
is that only structured proteins exhibit specific 
binding, i.e. can bind to a particular chemical entity 
to the exclusion of most others. Thus the residues to 
be varied are chosen with an eye to preserving the 
underlying IPBD structure. Substitutions that prevent 
the PBD from folding will cause GPs carrying those genes 
to bind indiscriminately so that they can easily be 
removed from the population. 

Sauer and colleagues (PAKU86, REID88a) , and 
Garuthers and colleagues (EISE85) have shown that some 
residues on the polypeptide chain are more important 
than others in determining the 3D structure of a 
protein. The 3D structure is essentially unaffected by 
the identity of the amino acids at some loci; at other 
loci only one or a few types of amino acid is allowed. 
In most cases, loci where wide variety is allowed have 
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the amino acid side group directed toward the solvent . 
Loci where limited variety is allowed frequently have 
the side group directed toward other parts of the 
protein. Thus substitutions of amino acids that are 
exposed to solvent are less likely to affect the 3D 
structure than are substitutions at internal loci. (See 
also SCHU79, pl69-171 and CREI84, p239-245, 314-315), 

The residues that join helices to helices, helices 
to sheets, and sheets to sheets are called turns and 
loops and have been classified by Richardson (RICH81) , 
Thornton (THOR88) , Sutcliffe et aJU (SUTC87a) and 
others . Insertions and deletions are more readily 
tolerated in loops than elsewhere. Thornton et al . 
(THOR8 8) have summarized many observations indicating 
that related proteins usually differ most at the loops 
which join the more regular elements of secondary 
structure. (These observations are relevant not only to 
the variegation of potential binding domains but also to 
the insertion of binding domains into an outer surface 
protein of a genetic package, as discussed in a later 
section . ) 

Burial of hydrophobic surfaces so that bulk water 
is excluded is one of the strongest forces driving the 
binding of proteins to other molecules. Bulk water can 
be excluded from the region between two molecules only 
if the surfaces are complementary. We should test as 
many surface variations as possible to find one that is 
complementary to the target. The selection- through- 
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binding isolates those proteins that are more nearly 
complementary to some surface on the target. 

Proteins do not have distinct, countable faces. 
Therefore we define an " interaction set " to be a set of 
residues such that all members of the set can 
simultaneously touch one molecule of the target material 
without any atom of the target coming closer than van 
der Waals distance to any main-chain atom of the IPBD. 
The concept of a residue "touching" a molecule of the 
target is discussed below. From a picture of BPTI (such 
as Figure 6-10, p. 225 of CREI84) we can see that 
residues 3, 7, 8, 10, 13, 39, 41, and 42 can all 
simultaneously contact a molecule the size and shape of 
myoglobin. We also see that residue 49 can not touch a 
single myoglobin molecule simultaneously with any of the 
first set even though all are on the surface of BPTI. 
(It is not the intent of the present invention, however, 
to suggest that use of models is required to determine 
which part of the target molecule will actually be the 
site of binding by PBD.) 

Variations in the position, orientation and nature 
of the side chains of the residues of the interaction 
set will alter the shape of the potential binding 
surface defined by that set. Any individual combination 
of such variations may result in a surface shape which 
is a better or a worse fit for the target surface. The 
effective diversity of a variegated population is 
measured by the number of distinct shapes the 
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potentially complementary surfaces of the PBD can adopt, 
rather than the number of protein sequences. Thus, it 
is preferable to maximize the former number, when our 
knowledge of the IPBD permits us to do so. 

To maximize the number of surface shapes generated 
for when N residues are varied, all residues varied in a 
given round of variegation should be in the same 
interaction set because variation of several residues in 
one interaction set generates an exponential number of 
different shapes of the potential binding surface. 

If cassette mutagenesis is to be used to introduce 
the variegated DNA into the ipbd gene, the protein 
residues to be varied are, preferably, close enough 
together in sequence that the variegated DNA (vgDNA) 
encoding all of them can be made in one piece. The 
present invention is not limited to a particular length 
of vgDNA that can be synthesized . With current 

technology, a stretch of 60 amino acids (180 DNA bases) 
can be spanned . 

Further, when there is reason to mutate residues 
further than sixty residues apart, one can use other 
mutational means, such as single- stranded- 

oligonucleotide-directed mutagenesis (BOTS85) using two 
or more mutating primers. 

Alternatively, to vary residues separated by more 
than sixty residues, two cassettes may be mutated as 
follows: 1) vg DNA having a low level of variegation 
(for example , 2 0 to 4 00 fold variegation) is introduced 
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into one cassette in the OCV, 2) cells are transformed 
and cultured, 3) vg OCV DNA is obtained, 4) a second 
segment of vgDNA is inserted into a second cassette in 
the OCV, and5) cells are transformed and cultured, GPs 
are harvested and subjected to select ion- through- 
binding. 

The composite level of variation preferably does 
not exceed the prevailing capabilities to a) produce 
very large numbers of independently transformed cells or 
b) detect small components in a highly varied 
population. The limits on the level of variegation are 
discussed later . 

Data about the IPBD and the target that are useful 
in deciding which residues to vary in the variegation 
cycle include: 1) 3D structure, or at least a list of 
residues on the surface of the IPBD, 2) list of 
sequences homologous to IPBD, and 3) model of the target 
molecule or a stand-in for the target. 

These data and an understanding of the behavior of 
different amino acids in proteins will be used to answer 
two questions : 

1) which residues of the IPBD are on the outside and 
close enough together in space to touch the target 
simultaneously? 

2) which residues of the IPBD can be varied with high 
probability of retaining the underlying IPBD 
structure? 
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Although an atomic model of the target material 
(obtained through X-ray crystallography, NMR, or other 
means) is preferred in such examination, it is not 
necessary. For example, if the target were a protein of 
unknown 3D structure, it would be sufficient to know the 
molecular weight of the protein and whether it were a 
soluble globular protein, a fibrous protein, or a 
membrane protein. Physical measurements, such as low- 
angle neutron diffraction, can determine the overall 
molecular shape, viz . the ratios of the principal 
moments of inertia . One can then choose a protein of 
known structure of the same class and similar size and 
shape to use as a molecular stand-in and yardstick. It 
is not essential to measure the moments of inertia of 
the target because, at low resolution, all proteins of a 
given size and class look much the same. The specific 
volumes are the same, all are more or less spherical and 
therefore all proteins of the same size and class have 
about the same radius of curvature. The radii of 
curvature of the two molecules determine how much of the 
two molecules can come into contact. 

The most appropriate method of picking the residues 
of the protein chain at which the amino acids should be 
varied is by viewing, with interactive computer 
graphics, a model of the IPBD. A stick- figure 

representation of molecules is preferred. A suitable 
set of hardware is an Evans & Sutherland PS3 9 0 graphics 
terminal (Evans & Sutherland Corporation, Salt Lake 
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City, UT) and a MicroVAX II supermicro computer (Digital 
Equipment Corp., Maynard, MA). The computer should, 
preferably, have at least 150 megabytes of disk storage, 
so that the Brookhaven Protein Data Bank can be kept on 
line. A FORTRAN compiler, or some equally good higher- 
level language processor is preferred for program 

development-. • Suitable- programs for viewing and 

manipulating protein models include: a) PS-FRODO, 

written by T. A. Jones (JONE85) and distributed by the 
Biochemistry Department of Rice University, Houston, TX; 
and b) PROTEUS, developed by Dayringer, Tramantano, and 
Fletterick (DAYR8 6) . Important features of PS- FRODO 
and PROTEUS that are needed to view and manipulate 
protein models for the purposes of the present invention 
are the abilities to: 1) display molecular stick 
figures of proteins and other molecules, 2) zoom and 
clip images in real time, 3) prepare various abstract 
representations of the molecules, such as a line joining 
C a s and side group atoms, 4) compute and display solvent- 
accessible surfaces reasonably quickly, 5) point to and 
identify atoms, and 6) measure distance between atoms. 

In addition, one could use theoretical 
calculations, such as dynamic simulations of proteins, 
to estimate whether a substitution at a particular 
residue of a particular amino-acid type might produce a 
protein of approximately the same 3D structure as the 
parent protein. Such calculations might also indicate 
whether a particular substitution will greatly affect 
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the flexibility of the protein; calculations of this 
sort may be useful but are not required. 

Residues whose mutagenesis is most likely to affect 
binding to a target molecule, without destabilizing the 
protein, are called the "principal set". Using the 
knowledge of which residues are on the surface of the 
IPBD (as noted above) , we pick residues that are close 
enough together on the surface of the IPBD to touch a 
molecule of the target simultaneously without having any 
IPBD main-chain atom come closer than van der Waals 
distance ( viz . 4.0 to 5.0 A) from any target atom. For 
the purposes of the present invention, a residue of the 
IPBD "touches" the target if: a) a main-chain atom is 
within van der Waals distance, viz . 4.0 to 5.0 A of any 
atom of the target molecule, or b) the C& is within D cut0 ff 
of any atom of the target molecule so that a side-group 
atom could make contact with that atom. 

Because side groups differ in size ( cf . Table 35) , 
some judgment is required in picking D cut0 f f . In the 
preferred embodiment, we will use D cut0 ff = 8-0 A, but 
other values in the range 6.0 A to 10.0 A could be used. 
If IPBD has G at a residue, we construct a pseudo C 6 with 
the correct bond distance and angles and judge the 
ability of the residue to touch the target from this 
pseudo Cg. 

Alternatively, we choose a set of residues on the 
surface of the IPBD such that the curvature of the 
surface defined by the residues in the set is not so 
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great that it would prevent contact between all residues 
in the set and a molecule of the target. This method is 
appropriate if the target is a macromolecule, such as a 
protein, because the PBDs derived from the IPBD will 
contact only a part of the macromolecular surface. The 
surfaces of macromolecules are irregular with varying 
curvatures. If we pick residues that define a surface 
that is not too convex, then there will be a region on a 
macromolecular target with a compatible curvature. 

In addition to the geometrical criteria, we prefer 
that there be some indication that the underlying IPBD 
structure will tolerate substitutions at each residue in 
the principal set of residues. Indications could come 
from various sources, including: a) homologous 

sequences, b) static computer modeling, or c) dynamic 
computer simulations . 

The residues in the principal set need not be 
contiguous in the protein sequence and usually are not. 
The exposed surfaces of the residues to be varied do not 
need to be connected. We desire only that the amino 
acids in the residues to be varied all be capable of 
touching a molecule of the target material 
simultaneously without having atoms overlap. If the 
target were, for example, horse heart myoglobin, and if- 
the IPBD were BPTI , any set of residues in one 
interaction set of BPTI defined in Table 34 could be 
picked. 
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The secondary set comprises those residues not in 
the primary set that touch residues in the primary set. 
These residues might be excluded from the primary set 
because: a) the residue is internal, b) the residue is 
highly conserved, or c) the residue is on the surface, 
but the curvature of the IPBD surface prevents the 
residue from being in contact with the target at the 
same time as one or more residues in the primary set. 

Internal residues are frequently conserved and the 
amino acid type can not be changed to a significantly 
different type without substantial risk that the protein 
structure will be disrupted- Nevertheless, some 

conservative changes of internal residues, such as I to 
L or F to Y, are tolerated. Such conservative changes 
subtly affect the placement and dynamics of adjacent 
protein residues and such "fine tuning" may be useful 
once an SBD is found. 

Surface residues in the secondary set are most 
often located on the periphery of the principal set. 
Such peripheral residues can not make direct contact 
with the target simultaneously with all the other 
residues of the principal set . The charge on the amino 
acid in one of these residues could, however, have a 
strong effect on binding. Once an SBD is found, it is 
appropriate to vary the charge of some or all of these 
residues. For example, the variegated codon containing 
equimolar A and G at base 1, equimolar C and A at base 
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2, and A at base 3 yields amino acids T, A, K, and E 
with equal probability. 

The assignment of residues to the primary and 
secondary sets may be based on: a) geometry of the IPBD 
and the geometrical relationship between the IPBD and 
the target (or a stand-in for the target) in a 
hypothetical complex, and b) sequences of proteins 
homologous to the IPBD. However, it should be noted 
that the distinction between the principal set and the 
secondary set is one more of convenience than of 
substance; we could just as easily have assigned each 
amino acid residue in the domain a preference score that 
weighed together the different considerations affecting 
whether they are suitable for variegation, and then 
ranked the residues in order, from most preferred to 
least . 

For any given round of variegation, it may be 
necessary to limit the variegation to a subset of the 
residues in the primary and secondary sets, based on 
geometry and on the maximum allowed level of variegation 
that assures progressivity . The allowed level of 

variegation determines how many residues can be varied 
at once; geometry determines which ones. 

The user may pick residues to vary in many ways. 
For example, pairs of residues are picked that are 
diametrically opposed across the face of the principal 
set. Two - such pairs are used to delimit the surface, 
up/down and right/left. Alternatively, three residues 
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that form an inscribed triangle, having as large an area 
as possible, on the surface are picked. One to three 
other residues are picked in a checkerboard fashion 
across the interaction surface. Choice of widely spaced 
residues to vary creates the possibility for high 
specificity because all the intervening residues must 
have acceptable complementarity before favorable 
interactions can occur at widely- separated residues. 

The number of residues picked is coupled to the 
range through which each can be varied by the 
restrictions discussed below. In the first round, we do 
not assume any binding between IPBD and the target and 
so progressivity is not an issue. At the first round, 
the user may elect to produce a level of variegation 
such that each molecule of vgDNA is potentially 
different through, for example, unlimited variegation of 
10 codons (2 0 10 approx. = 10 13 ) . One run of the DNA 
synthesizer produces approximately 10 13 molecules of 
length 100 nts. Inefficiencies in ligation and 

transformation will reduce the number of proteins 
actually tested to between 10 7 and 5-10 8 . Multiple 
replications of the process with such very high levels 
of variegation will not yield repeatable results; the 
user decides whether this is important. 

Ill .C. Determining the Substitution Set for Each 

Parental Residue 

Having picked which residues to vary, we now decide 
the range of amino acids to allow at each variable 
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residue. The total level of variegation is the product 
of the number of variants at each varied residue. Each 
varied residue can have a different scheme of 
variegation, producing 2 to 20 different possibilities. 
The set of amino acids which are potentially encoded by 
a given variegated codon are called its "substitution 



set" 



The computer that controls a DNA synthesizer, such 
as the Milligen 7500, can be programmed to synthesize 
any base of an oligo-nt with any distribution of nts by 
taking some nt substrates (e^ nt phosphoramidites) 
from each of two or more reservoirs. Alternatively, nt 
substrates can be mixed in any ratios and placed in one 
of the extra reservoir for so called "dirty bottle- 
synthesis. Each codon could be programmed differently. 
The "mix" of bases at each nucleotide position of the 
codon determines the relative frequency of occurrence of 
the different amino acids encoded by that codon. 

Simply variegated codons are those in which those 
nucleotide positions which are degenerate are obtained 
from a mixture of two or more bases mixed in equimolar 
proportions. These mixtures are described in this 
specification by means of the standardized "ambiguous 
nucleotide" code (Table 1 and 37 CFR §1.822). In this 
code, for example, in the degenerate codon " SNT " , "S» 
denotes an equimolar mixture of bases G and C, "N" , an 
equimolar mixture of all four bases, and "T" , the single 
invariant base thymidine. 
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Complexly variegated codons are those in which at 
least one of the three positions is filled by a base 
from an other than equimolar mixture of two of more 
bases . 

Either simply or complexly variegated codons may be 
used to achieve the desired substitution set. 

If we have no information indicating that a 
particular amino acid or class of amino acid is 
appropriate, we strive to substitute all amino acids 
with equal probability because representation of one 
mini-protein above the detectable level is wasteful. 
Equal amounts of all four nts at each position in a 
codon (NNN) yields the amino acid distribution in which 
each amino acid is present in proportion to the number 
of codons that code for it. This distribution has the 
disadvantage of giving two basic residues for every 
acidic residue. In addition, six times as much R, S, 
and Li as W or M occur . Iff ive codons are synthesi zed 
with this distribution, each of the 243 sequences 
encoding some combination of L, R, and S are 7776-times 
more abundant than each of the 3 2 sequences encoding 
some combination of W and M. To have five Ws present at 
detectable levels, we must have each of the (L,R,S) 
sequences present in 7776-fold excess. 

Preferably, we also consider the interactions 
between the sites of variegation and the surrounding 
DNA. If the method of mutagenesis to be used is 
replacement of a cassette, we consider whether the 
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variegation will generate gratuitous restriction sites 
and whether they seriously interfere with the intended 
introduction of diversity. We reduce or eliminate 
gratuitous restriction sites by appropriate choice of 
variegation pattern and silent alteration of codons 
neighboring the sites of variegation. 

It is generally accepted that the sequence of amino 
acids in a protein or polypeptide determine the three- 
dimensional structure of the molecule, including the 
possibility of no definite structure. Among 
polypeptides of definite length and sequence, some have 
a defined tertiary structure and most do not. 

Particular amino acid residues can influence the 
tertiary structure of a defined polypeptide in several 
ways, including by: 

a) affecting the flexibility of the polypeptide main 
chain, 

b) adding hydrophobic groups, 

c) adding charged groups, 

d) allowing hydrogen bonds, and 

e) forming cross-links, such as disulfides, chelation 
to metal ions, or bonding to prosthetic groups. 

Most works on proteins classify the twenty amino acids 
into categories such as hydrophobic/hydrophilic, 
positive/negative/neutral, or large/small. These 
classifications are useful rules of thumb, but one must 
be careful not to oversimplify. Proteins contain a 
variety of identifiable secondary structural features, 



95 



including: a) a helices, b) 3-10 helices, O antx- 
parallel S sheets, d) parallel fi sheets, e) Q loops, f) 
reverse turns, and g) various cross links. Many people 
have analyzed proteins of known structures and assigned 
each amino-acid to one category or another. Using the 
frequency at which particular amino acids occur m 
'various types of secondary- structures, people have a) 
tried to predict the secondary structures of proteins 
for which only the amino-acid sequence is known (CHOU74, 
CHOU78a, CHOU78b) , and b) designed proteins de novo that 
have a particular set of secondary structural elements 
(DEGR87 , HECH90) . Although some amino acids show 
definite predilection for one secondary form (e^.. VAL 
for 6 structure and ALA for a helices), these 
preferences are not very strong; Creighton has tabulated 
the preferences (CREI84) . In only seven cases does the 
tendency exceed 2.0: 
Amino 

ac id distinction ratio 

MET a/turn 3.7 

PRO turn/a 3.7 

VAL S/turn 3.2 

GLY turn/ a 2 . 9 

ILE ft/turn 2.8 

PHE S/turn 2 . 3 

LEU a/turn 2 . 2 

Every amino-acid type has been observed in every iden- 
tified secondary structural motif. ARG is particularly 

indiscriminate . 

PR 0 is generally taken to be a helix breaker. 
Nevertheless, proline often occurs at the beginning of 
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helices or even in the middle of a helix, where it 
introduces a slight bend in the helix. Matthews and 
coworkers repl aced a PRO that occurs near the middle of 
an of helix in T4 lysozyme. To their surprise, the 
"improved" protein is less stable than the wild-type. 
The rest of the structure had been adapted to fit the 
bent helix. 

Lundeen (LUND8 6) has tabulated the frequencies of 
amino acids in helices, S strands, turns, and coil in 
proteins of known 3D structure and has distinguished 
between CYSs having free thiol groups and half cystines. 
He reports that free CYS is found most often in helixes 
while half cystines are found more often in 6 sheets. 
Half cystines are, however, regularly found in helices. 
Pease et al . (PEAS90) constructed a peptide having two 
cystines; one end of each is in a very stable o: helix. 
Apamin has a similar structure (WEMM83, PEAS 8 8 ) . 
Flexibility : 

GLY is the smallest amino acid, having two 
hydrogens attached to the C a . Because GLY has no C s , it 
confers the most flexibility on the main chain. Thus 
GLY occurs very frequently in reverse turns, 
particularly in conjunction with PRO, ASP, ASN, SER, and 
THR. 

The amino acids ALA, SER, CYS, ASP, ASN, LEU, MET, 
PHE, TYR, TRP, ARG, HIS, GLU, GLN, and LYS have 
unbranched S carbons. Of these, the side groups of SER, 
ASP, and ASN frequently make hydrogen bonds to the main 
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chain and so can take on main-chain conformations that 
are energetically unfavorable for the others. VAL, ILE, 
and THR have branched £ carbons which makes the extended 
main-chain conformation more favorable. Thus VAL and 
ILE are most often seen in IS sheets. Because the side 
group of THR can easily form hydrogen bonds to the main 
chain, it has less tendency to exist in a & sheet. 

The main chain of proline is particularly 
constrained by the cyclic side group. The 0 angle is 
always close to -60°. Most prolines are found near the 
surface of the protein. 
Charge : 

LYS and ARG carry a single positive charge at any 
pH below 10.4 or 12.0, respectively. Nevertheless, the 
methylene groups, four and three respectively, of these 
amino acids .are capable of hydrophobic interactions. 
The guanidinium group of ARG is capable of donating five 
hydrogens simultaneously, while the amino group of LYS 
can donate only three. Furthermore, the geometries of 
these groups is quite different, so that these groups 
are often not interchangeable . 

ASP and GLU carry a single negative charge at any 
pH above «=4 . 5 and 4.6, respectively. Because ASP has 
but one methylene group, few hydrophobic interactions 
are possible. The geometry of ASP lends itself to 
forming hydrogen bonds to main- chain nitrogens which is 
consistent with ASP being found very often in reverse 
turns and at the beginning of helices. GLU is more 
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often found in or helices and particularly in the amino- 
terminal portion of these helices because the negative 
charge of the side group has a stabilizing interaction 
with the helix dipole (NICH88, SALI88) . 

HIS has an ionization pK in the physiological 
range, viz . 6.2. This pK can be altered by the 
proximity of charged groups or of hydrogen donators or 
acceptors. HIS is capable of forming bonds to metal 
ions such as zinc, copper, and iron. 
Hydrogen bonds : 

Aside from the charged amino acids, SER, THR, ASN, 
GLN, TYR, and TRP can participate in hydrogen bonds. 
Cross links ; 

The most important form of cross link is the 
disulfide bond formed between two thiols, especially the 
thiols of CYS residues. In a suitably oxidizing 

environment, these bonds form spontaneously. These 
bonds can greatly stabilize a particular conformation of 
a protein or mini-protein. When a mixture of oxidized 
and reduced thiol reagents are present, exchange 
reactions take place that allow the most stable 
conformation to predominate. Concerning disul fides in 
proteins and peptides, see also KATZ90 , MATS 8 9 , PERR84, 
PERR86, SAUE86, WELL8 6, JANA8 9, HORV8 9, KISH85, and 
SCHN86 . 

Other cross links that form without need of 
specific enzymes include: 

1) (CYS) 4 :Fe Rubredoxin (in CREI84, P. 376) 
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2) (CYS) 4 :Zn 



Aspartate Transcarbamylase (in 



CREI84, P. 376) and Zn-fingers 
(hard90) ( seq (t> Mo%\Z%> 

3) (HIS) 2 (MET) (CYS) : Cu Azurin (in CREI84, Pi376) and 



Cross links having (HIS) 2 (MET) (CYS) : Cu has the potential 
advantage that HIS and MET can not form other cross 
links without Cu . 
Simply Variegated Codons 

The following simply variegated codons are useful 
because they encode a relatively balanced set of amino 
acids : 

1) SNT which encodes the set [L, P , H, R, V, A, D , G] : a) 
one acidic (D) and one basic (R) , b) both aliphatic 
(L,V) and aromatic hydrophobics (H) , c) large 
(L,R,H) and small (G, A) side groups, d) rigid (P) 
and flexible (G) amino acids, e) each amino acid 
encoded once . 

2) RNG which encodes the set [M, T, K, R, V, A, E, G] : a) 
one acidic and two basic (not optimal, but 
acceptable) , b) hydrophilics and hydrophobics, c) 
each amino acid encoded once. 



Basic "Blue" Cu Cucumber 



4) (HIS) 4 :Cu 



5) (CYS) 4 : (Fe 4 S 4 ) 



7) (CYS) 3 (HIS) :Zn 



6) (CYS) 2 (HIS) 2 :Zn 
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3) RMG which encodes the set [T,K,A,E]: a) one 
acidic, one basic, one neutral hydrophilic, b) 
three favor a helices, c) each amino acid encoded 
once . 

4) VNT which encodes the set 

[L , P , H, R, I , T , N, S , V, A, D, G] : a) one acidic, one 

basic, b) all classes: charged, neutral- 

hydrophilic, hydrophobic, rigid and flexible, etc., 
c) each amino acid encoded once. 

5) RRS which encodes the set [N, S, K, R, D, E, G 2 ] : a) two 
acidics, two basics, b) two neutral hydrophilics, 
c) only glycine encoded twice. 

6) NNT which encodes the set 

[F # S,Y,C,L,P,H,R,I # T,N,V,A,D,G] : a) sixteen DNA 

sequences provide fifteen different amino acids; 
only serine is repeated, all others are present in 
equal amounts (This allows very efficient sampling 
of the library.), b) there are equal numbers of 
acidic and basic amino acids (D and R, once each) , 
c) all major classes of amino acids are present: 
acidic, basic, aliphatic hydrophobic, aromatic 
hydrophobic, and neutral hydrophilic. 

7) NNG, which encodes the set 

[Ii 2 ,R 2 ,S,W,P,Q,M,T,K,V,A,E,G, stop]: a) fair 

preponderance of residues that favor formation of 
of-helices [L, M, A, Q, K, E; and, to a lesser extent, 
S,R,T]; b) encodes 13 different amino acids. (VHG 
encodes a subset of the set encoded by NNG which 
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encodes 9 amino acids in nine different DNA 
sequences, with equal acids and bases, and 5/9 
being a helix-f avoring . ) 

For the initial variegation, NNT is preferred, in 
most cases. However, when the codon is encoding an 
amino acid to be incorporated into an a helix, NNG is 
preferred. 

Below, we analyze several simple variegations as to 
the efficiency with which the libraries can be sampled. 

Libraries of random hexapeptides encoded by (NNK) 6 
have been reported (SCOT90, CWIR90) . Table 130 shows 
the expected behavior of such libraries. NNK produces 
single codons for PHE, TYR, CYS , TRP, HIS, GLN, ILE, 
MET, ASN, LYS, ASP, and GLU (a set) ; two codons for each 
of VAL, ALA, PRO, THR, and GLY (3> set) ; and three codons 
for each of LEU, ARG, and SER (Q set) . We have 
separated the 64 , 000 , 000 possible sequences into 28 
classes, shown in Table 130A, based on the number of 
amino acids from each of these sets. The largest class 
is <i>Qofo;Qfa with «14.6% of the possible sequences. Aside 
from any selection, all the sequences in one class have 
the same probability of being produced. Table 130B 
shows the probability that a given DNA sequence taken 
from the (NNK) 6 library will encode a hexapeptide 
belonging to one of the defined classes; note that only 
*=6.3% of DNA sequences belong to the $>Qotactoi class. 

Table 13 0C shows the expected numbers of sequences 
in each class for libraries containing various numbers 
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of independent transf ormants (viz. 10 6 , 3-10 6 , 10 7 , 3-10 7 , 
10 8 , 3-10 8 , 10 9 , and 3-10 9 ). At 10 6 independent 

transf ormants (ITs) , we expect to see 56% of the QQQQQQ 
class, but only 0.1% of the aototototot class. The vast 
majority of sequences seen come from classes for which 
less than 10% of the class is sampled. Suppose a 
peptide from, for example, class &$>QQotot is isolated by 
fractionating the library for binding to a target. 
Consider how much we know about peptides that are 
related to the isolated sequence. Because only 4% of 
the ^QQao! class was sampled, we can not conclude that 
the amino acids from the Q set are in fact the best from 
the Q set. We might have LEU at position 2, but ARG or 
SER could be better. Even if we isolate a peptide of 
the QQQQQQ class, there is a noticeable chance that 
better members of the class were not present in the 
library. 

With a library of 10 7 ITs, we see that several 
classes have been completely sampled, but that the 
ceaaaofo: class is only 1.1% sampled. At 7.6-10 7 ITs, we 
expect display of 50% of all amino-acid sequences, but 
the classes containing three or more amino acids of the 
a set are still poorly sampled. To achieve complete 
sampling of the (NNK) 6 library requires about 3-10 9 ITs, 
10-fold larger than the largest (NNK) 6 library so far 
reported . 

Table 131 shows expectations for a library encoded 
by (NNT) 4 (NNG) 2 . The expectations of abundance are 
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independent of the order of the codons or of 
interspersed unvaried codons. This library encodes 
0.133 times as many amino-acid sequences, but there are 
only 0.0165 times as many DNA sequences. Thus 5.0-10 7 
ITs (i.e. 60-fold fewer than required for (NNK) 6 ) gives 
almost complete sampling of the library. The results 
would be slightly better for (NNT) 6 and slightly, but not 
much, worse for (NNG) 6 . The controlling factor is the 
ratio of DNA sequences to amino-acid sequences. 

Table 132 shows the ratio of #DNA sequences/#AA 
sequences for codons NNK, NNT, and NNG. For NNK and 
NNG, we have assumed that the PBD is displayed as part 
of an essential gene, such as gene III in Ff phage, as 
is indicated by the phrase "assuming stops vanish" . It 
is not in any way required that such an essential gene 
be used. If a non-essential gene is used, the analysis 
would be slightly different; sampling of NNK and NNG 
would be slightly less efficient. Note that (NNT) 6 gives 
3.6-fold more amino-acid sequences than (NNK) 5 but 
requires 1.7 -fold fewer DNA sequences. Note also that 

(NNT) 7 gives twice as many amino-acid sequences as 

(NNK) 6 , but 3. 3 -fold fewer DNA sequences. 

Thus, while it is possible to use a simple mixture 

(NNS, NNK or NNN) to obtain at a particular position all 
twenty amino acids, these simple mixtures lead to a 
highly biased set of encoded amino acids. This problem 
can be overcome by use of complexly variegated codons. 



104 



Complexly Variegated Codons 

Let Abun (x) be the abundance of DNA sequences 
coding for amino acid x, defined by the distribution of 
nts at each base of the codon. For any distribution, 
there will be a most -favored amino acid (mfaa) with 
abundance Abun (mfaa) and a least-favored amino acid 
(lfaa) with abundance Abun(lfaa) . We seek the nt 
distribution that allows all twenty amino acids and that 
yields the largest ratio Abun (If aa) /Abun (mfaa) subject, 
if desirable to further constraints. 

We first will present the mixture calculated to be 
optimal when the nt distribution is subject to two 
constraints: equal abundances of acidic and basic amino 
acids and the least possible number of stop codons. 
Thus only nt distributions that yield Abun (E) +Abun (D) = 
Abun (R) +Abun (K) are considered, and the function 
maximized is: 

{ (l-Abun(stop) ) (Abun (lfaa) /Abun (mfaa) ) } . 
We have simplified the search for an optimal nt 
distribution by limiting the third base to T or G (C or 
G is equivalent) . All amino acids are possible and the 
number of accessible stop codons is reduced because TGA 
and TAA codons are eliminated. The amino acids F, Y, C, 
H, N, I, and D require T at the third base while W, M, 
Q, K, and E require G. Thus we use an equimolar mixture 
of T and G at the third base. However, it should be 
noted that the present invention embraces use of 
complexly variegated codons in which the third base is 
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not limited to T or G (or to C or G) . 

A computer program, written as part of the present 
invention and named "Find Optimum vgCodon" (See Table 
9), varies the composition at bases 1 and 2, in steps of 
0.05, and reports the composition that gives the largest 
value of the quantity { (Abun (If aa) /Abun (mfaa) (1- 
Abun (stop) ) ) } . A vg codon is symbolically defined by 
the nucleotide distribution at each base: 







T 


C 


A 


G 


base 


#1 = 


tl 


cl 


al 


gi 


base 


#2 - 


t2 


c2 


a2 


g2 


base 


#3 = 


t3 


c3 


a3 


g3 



tl + cl + al + gl = 1.0 
t2 + c2 + a2 + g2 = 1.0 
t3 = g3 = 0.5, c3 = a3 = 0 . 
The variation of the quantities tl, cl, al, gl, t2, c2 , 
a2 , and g2 is subject to the constraint that: 
Abun(E) +Abun(D) = Abun (K) +Abun (R) 
Abun (E) +Abun (D) = gl*a2 

Abun(K) +Abun(R) = al*a2/2 + cl*g2 + al*g2/2 

gl*a2 = al*a2/2 + cl*g2 + al*g2/2 
Solving for g2 , we obtain 

g2 = (gl*a2 - 0 . 5*al*a2 ) / (cl + 0.5*al) 
In addition, 

tl = 1 - al - cl - gl 

t2 = 1 - a2 - c2 - g2 
We vary al, cl, gl, a2 , and c2 and then calculate tl, 
g2, and t2 . Initially, variation is in steps of 5%. 
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Once an approximately optimum distribution of 
nucleotides is determined, the region is further 
explored with steps of 1%. The logic of this program is 
shown in Table 9. The optimum distribution (the "gfk" 
codon) is shown in Table 10A and yields DNA molecules 
encoding each type amino acid with the abundances 
shown . 

Note that this chemistry encodes all twenty amino 
acids, with acidic and basic amino acids being 
equiprobable, and the most favored amino acid (serine) 
is encoded only 2.454 times as often as the least 
favored amino acid (tryptophan) . The "gfk" vg codon 
improves sampling most for peptides containing several 
of the amino acids [F,Y,C,W,H,Q, I,M,N, K, D, E] for which 
NNK or NNS provide only one codon. Its sampling 

advantages are most pronounced when the library is 
relatively small. 

A modification of "Find Optimum vgCodon" varies the 
composition at bases 1 and 2, in steps of 0.01, and 
reports the composition that gives the largest value of 
the quantity { (Abun(lfaa) /Abun(mfaa) ) } without any 
restraint on the relative abundance of any amino acids. 
The results of this optimization is shown in Table 10B. 
The changes are small , indicating that insisting on 
equality of acids and bases and minimizing stop codons 
costs us little. Also note that, without restraining 
the optimization, the prevalence of acidic and basic 
amino acids comes out fairly close. On the other hand, 
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relaxing the restriction leaves a distribution in which 
the least favored amino acid is only .412 times as 
prevalent as SER . 

The advantages of an NNT codon are discussed 
elsewhere in the present application. Unoptimized NNT 
provides 15 amino acids encoded by only 16 DNA 
sequences. It is possible to improve on NNT as follows. 
First note that the SER codons occur in the T and A rows 
of the genetic-code table and in the C and G columns. 

[SER] = Ti x C 2 + A x x G 2 
If we reduce the prevalence of SER by reducing Ti, C 2/ 
A X/ and G 2 relative to other bases, then we will also 
reduce the prevalence of PHE, TYR, CYS , PRO, THR, ALA, 
ARG, GLY, ILE, and ASN. The prevalence of LEU, HIS, 
VAL, and ASP will rise. If we assume that Ti, C 2 , Ai, 
and G 2 are all lowered to the same extent and that Ci, 
Gi, T 2 , and A 2 are increased by the same amount, we can 
compute a shift that makes the prevalence of SER equal 
the prevalences of LEU, HIS, VAL, and ASP. The 
decreases in each of PHE, TYR, CYS, PRO, THR, ALA, ARG, 
GLY, ILE, and ASN are not equal; CYS and THR are reduced 
more than the others . 

Let the distribution be 

T C A G 

base #1 =.25-q .25+q .25-q .25+q 



base #2 =.25+q .25-q - .25+q" .25-q 
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base #3 =.1.00 0.0 0.0 0.0 

Setting [SER] = [LEU] = [HIS] = [VAL] = [ASP] gives: 
( .25-q) • ( . 25-q) + ( . 25-q) - ( . 25-q) = ( . 25+q) • ( . 25+q) 
2 • ( .25-q) 2 = ( .25+q) 2 
q 2 -1.5 q + .0625 = 0 
q = (3/4) - /2/2 = .0428 

This distribution (shown in Table 10C) gives five 
amino acids (SER, LEU, HIS, VAL, ASP) in very nearly 
equal amounts. A further eight amino acids (PHE, TYR, 
ILE, ASN, PRO, ALA, ARG, GLY) are present at 78% the 
abundance of SER. THR and CYS remain at half the 
abundance of SER. When variegating DNA for disulfide- 
bonded mini -proteins, it is often desirable to reduce 
the prevalence of CYS. This distribution allows 13 
amino acids to be seen at high level and gives no stops ; 
the optimized fxS distribution allows only 11 amino 
acids at high prevalence. 

The NNG codon can also be optimized. Table 10D 
shows an approximately optimized NNG codon. When 
equimolar T,C,A,G are used in NNG, one obtains double 
doses of LEU and ARG. To improve the distribution, we 
increase G x by 46, decrease Ti and Ai by 5 each and Ci by 
26. We adopt this pattern because Ci affects both LEU 
and ARG while Ti and A x each affect either LEU or ARG, 
but not both. Similarly, we decrease T 2 and G 2 by r 
while we increase C 2 and A 2 by r. We adjusted. 6 and r 
until [ALA] ~ [ARG] . There are, under this variegation, 
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four equally «t favored amino acids: LEU, ARG, ALA, 
and GLU • Note that there is one acidic and one basic 
amino acid in this set. There are two equally least 
favored amino acids: TRP and MET. The ratio of 
Ifaa/mfaa is 0.S2S8. If this codon is repeated six 
times, peptides composed entirely =£ TRP and MET are 2, 
as common' as peptides composed entirely of the most 
favored amino acids. «e refer to this as "the 
prevalence of (TRP/MET) 6 in optimized NNG« vgDNA" 

When synthesizing vgDNA by the "dirty bottle- 
method, it is sometimes desirable to use only a limited 
number of mixes. One very useful mixture is called the 
-optimized NNS mixture" in which we average the first 
two positions of the fxS mixture: 

A - 0 33 Gi = 0.26, the second position is identical to 
t L first', C 3 - G, - 0.5. This distribution provides the 
amino acids ARG, SER. LEU. GLY, VAL, THR, ASN, and LYS 
at greater than 5% plus ALA, ASP, GLU, II*. MET, and TYR 

at greater than 4%. 

An additional complexly variegated codon is of 
interest. This codon is identical to the optimized NNT 
codon at the first two positions and has T:G: =90:10 at 
the third position. This codon provides thirteen ammo 
acids (ALA, ILE, ARG, SER, ASP, LEU, VAL, PHE, ASN, GLY 
PRO tyr, and HIS) at more than 5.5%. THR at 4.3% and 
CYs' at 3.9% are more common than the LFAAs of NNK 
(3 125%) The remaining five amino acids are present at 
iess than 1%. This codon has the feature that all amino 
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acids are present; sequences having more than two of the 
low-abundance amino acids are rare. when we isolate an 
SBD using this codon, we can be reasonably sure that the 
first 13 amino acids were tested at each position, 
similar codon, based on optimised NNG, could be used. 

Table 10E shows some properties of an unoptimrzed 
NWS (or NNK) codon. Note that there are three equally 
most-favored amino acids, ARC LEU, and SER. There are 
also twelve equally least favored amino acids: ' 
XLE, ME^YR. "IS, GLN, AS*. LYS, ASP, GLU, CYS, and 

•j tv-on, thr ALA, VAL, GLY) fall m 
TRP Five amino acids (PRO, THR, aij*, 

between. Note that a six- fold repetition of N«S gives 
sequences composed of the amino acids [PHE, ILE, MET, 
TYR, H1B, GLN, ASN, LYS , ASP, GLU, CYS, and TRP] at on y 
.0 1% of the sequences composed of [ARG, LEU, and SER1 
Not only is this -20-fold lower than the prevalence of 
(TRP/MET)' in optimized NNG* vgDNA, but this low 
prevalence applies to twelve amino acids. 
Diffuse Mutagenesis 

Diffuse Mutagenesis can be applied to any part of 

, .„ a . anv time, but is most appropriate when 
the protein at any time, 

some binding to the target has been established 
Diffuse Mutagenesis can be accomplished by spiking each 
of the pure nts activated for DNA synthesis (e.g. nt- 
phosphoramidites) with a small amount of one or more of 
the other activated nts. 

contrary to general practice, the present invention 
sets the level of spiking so that only a small 
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percentage (1% to .00001%, for example) of the final 
product will contain the initial DNA sequence. This 
will insure that many single, double, triple, and higher 
mutations occur, but that recovery of the basic sequence 
will be a possible outcome. Let N b be the number of 
bases to be varied, and let Q be the fraction of all 
sequences that should have the parental sequence, then 
M, the fraction of the mixture that is the majority 
component , i s 

M = exp{ log e (Q)/N b } = 10 (log 10 (Q) /N b> . 
If, for example, thirty base pairs on the DNA chain were 
to be varied and 1% of the product is to have the 
parental sequence, then each mixed nt substrate should 
contain 86% of the parental nt and 14% of other nts. 
Table 8 shows the fraction l£n) of DNA molecules having 
n non-parental bases when 30 bases are synthesized with 
reagents that contain fraction M of the majority 
component. When M=. 63096, f24 and higher are less than 
10" 8 . The entry "most" in Table 8 is the number of 
changes that has the highest probability. Note that 
substantial probability for multiple substitutions only 
occurs if the fraction of parental sequence (fO) is 
allowed to drop to around 10" 6 . The N b base pairs of the 
DNA chain that are synthesized with mixed reagents need 
not be contiguous. They are picked so that between N b /3 
and N b codons are affected to various degrees. The 
residues picked for mutation are picked with reference 
to the 3D structure of the IPBD, if known. For example, 



112 



one might pick all or most of the residues in the 
principal and secondary set. We may impose restrictions 
on the extent of variation at each of these residues 
based on homologous sequences or other data. The 
mixture of non-parental nts need not be random, rather 
mixtures can be biased to give particular amino acid 
types specific probabilities of appearance at each 
codon. For example, one residue may contain a 

hydrophobic amino acid in all known homologous 
sequences; in such a case, the first and third base of 
that codon would be varied, but the second would be set 
to T. Other examples of how this might be done are 
given in the horse heart myoglobin example. This 
diffuse structure-directed mutagenesis will reveal the 
subtle changes possible in protein backbone associated 
with conservative interior changes, such as V to I, as 
well as some not so subtle changes that require 
concomitant changes at two or more residues of the 
protein. 

Ill . D . Special Considerations Relating to Variegation 

of Mini -Proteins with Essential Cysteines 

Several of the preferred simple or complex 
variegated codons encode a set of amino acids which 
includes cysteine. This means that some of the encoded 
binding domains will feature one or more cysteines in 
addition to the invariant disulf ide-bonded cysteines. 
For example, at each NNT-encoded position, there is a 
one in sixteen chance of obtaining cysteine. If six 
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codons are so varied, the fraction of domains containxng 
additional cysteines is 0.33. Odd numbers of cysteines 
can lead to complications, see Perry and Wetzel 
(PERR86) . On the other hand, many disulfide- containxng 
proteins contain cysteines that do not form disulfides, 
e.g. trypsin. The possibility of unpaired cysteines can 
be dealt with in several ways: 

First, the variegated phage population can be 
passed over an immobilized reagent that strongly binds 
free thiols, such as SulfoLink (catalogue number 44895 H 
from Pierce Chemical Company, Rockford, Illinois, 
61105). Another product from Pierce is TNB-Thiol 
Agarose (Catalogue Code 20409 H) - BioRad sells Affx- 
Gel 401 (catalogue 153-4599) for this purpose. 

Second, one can use a variegation that excludes 

cysteines, such as: 

NHT that gives [F, S , Y, L, P, H, I , T ,N, V, A, D] , 
VNS that gives 

tt a c nil p n R 2 M T,K,R, V,A,E,G, stop] , 
NNG that gives [L 2 , S , W , P , U , k , -l , ^, , 

SNT that gives [L, P, H, R, V, A, D, G] , 

RNG that gives [M, T, K, R, V, A, E, G] , 

RMG that gives [T,K,A,E], 

VNT that gives [L, P# H, R, I # T,N, S , V, A, D, G] , or 

RRS that gives [N, S , K, R, D, E # G 2 ] . 
However, each of these schemes has one or more of the 
disadvantages, relative to NNT : a) fewer amino acids are 
allowed, b) amino acids are not evenly provided, O 
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acidic and basic amino acids are not equally likely) , or 
d) stop codons occur. Nonetheless, NNG, * NHT, and VNT 
are almost as useful as NNT. NNG encodes 13 different 
amino acids and one stop signal . Only two amino acids 
appear twice in the 16-fold mix. 

Thirdly, one can enrich the population for binding 
to the preselected target, and evaluate selected 
sequences post hoc for extra cysteines. Those that 
contain more cysteines than the cysteines provided for 
conformational constraint may be perfectly usable. It 
is possible that a disulfide linkage other than the 
designed one will occur. This does not mean that the 
binding domain defined by the isolated DNA sequence is 
in any way unsuitable. The suitability of the isolated 
domains is best determined by chemical and biochemical 
evaluation of chemically synthesized peptides. 

Lastly, one can block free thiols with reagents, 
such as Ellman's reagent, iodoacetate, or methyl iodide, 
that specifically bind free thiols and that do not react 
with disulfides, and then leave the modified phage in 
the population. It is to be understood that the 

blocking agent may alter the binding properties of the 
mini -protein; thus, one might use a variety of blocking 
reagent in expectation that different binding domains 
will be found. The variegated population of thiol - 
blocked genetic packages are fractionated for binding. 
If the DNA sequence of the isolated binding mini -protein 
contains an odd number of cysteines, then synthetic 
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means are used to prepare mini -proteins having each 
possible linkage and in which the odd thiol is 
appropriately blocked. Nishiuchi (NISH82, NISH86, and 
works cited therein) disclose methods of synthesizing 
peptides that contain a plurality of cysteines so that 
each thiol is protected with a different type of 
blocking group. These groups can be selectively removed 
so that the disulfide pairing can be controlled. We 
envision using such a scheme with the alteration that 
one thiol either remains blocked, or is unblocked and 
then reblocked with a different reagent. 

Ill . E . Planning the Second and Later Rounds of 

Variegation 

The method of the present invention allows 
efficient accumulation of information concerning the 
amino- acid sequence of a binding domain having high 
affinity for a predetermined target. Although one may 
obtain a highly useful binding domain from a single 
round of variegation and affinity enrichment, we expect 
that multiple rounds will be needed to achieve the 
highest possible affinity and specificity. 

If the first round of variegation results in some 
binding to the target, but the affinity for the target 
is still too low, further improvement may be achieved by 
variegation of the SBDs . Preferably, the process is 
progressive, i.e. each variegation cycle produces a 
better starting point for the next variegation cycle 
than the previous cycle produced. Setting the level of 
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variegation such that the ppbd and many sequences 
related to the ppbd sequence are present in detectable 
amounts ensures that the process is progressive. If the 
level of variegation is so high that the ppbd sequence 
is present at such low levels that there is an 
appreciable chance that no transformant will display the 
PPBD, then the best SBD of the next round could be worse 
than the PPBD. At excessively high level of 

variegation, each round of mutagenesis is independent of 
previous rounds and there is no assurance of 
progressivity . This approach can lead to valuable 
binding proteins, but repetition of experiments with 
this level of variegation will not yield progressive 
results. Excessive variation is not preferred. 

Progressivity is not an all-or-nothing property. 
So long as most of the information obtained from 
previous variegation cycles is retained and many 
different surfaces that are related to the PPBD surface 
are produced, the process is progressive. If the level 
of variegation is so high that the ppbd gene may not be 
detected, the assurance of progressivity diminishes. If 
the probability of recovering PPBD is negligible, then 
the probability of progressive behavior is also 
negligible . 

A level of variegation that allows recovery of the 
PPBD has two properties: 

1) we can not regress because the PPBD is available, 
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a) an enormous number of multiple changes related to 
the PPBD are available for selection and we are 
able to detect and benefit from these changes. 
It is very unlikely that all of the variants will be 
worse than the PPBD ; we desire the presence of PPBD at 
detectable levels to insure that all the seances 
present are indeed related to PPBD. 

An opposing force in our design considerations is 
that PBDs are useful in the population only up to the 
amount that can be detected,- any excess above the 

.. f= m . M d Thus we produce as many 

detectable amount is wasted. *- 

-, to PPBD as possible within the 

surfaces related to wan <» r 

constraint that the PPBD be detectable. 

If the level of variegation in the previous 
variegation cycle was correctly chosen, then the amino 
acids selected to be in the residues 3 ust varied are the 
ones best determined. The environment of other residues 
has changed, so that it is appropriate to vary them 
again. Because there are often more residues in the 
principal and secondary sets than can be varied 
simultaneously, we start by picxing residues that either 
have never been varied (highest priority, or that have 
not been varied for one or more cycles. If we find that 
varying all the residues except those varied m the 
previous cycle does not allow a high enough level of 
diversity, then residues varied in the previous cycle 
mi ght be varied again. For example, if M„„ (the number 
of independent transf ormants that can be produced from 
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Ydioo of DNA) and C sen si (the sensitivity of the affinity- 
separation) were such that seven residues could be 
varied, and if the principal and secondary sets 
contained 13 residues, we would always vary seven 
residues, even though that implies varying some residue 
twice in a row. In such cases, we would pick the 
residues just varied that contain the amino acids of 
highest abundance in the variegated codons used. 

It is the accumulation of information that allows 
the process to select those protein sequences that 
produce binding between the SBD and the target . Some 
interfaces between proteins and other molecules involve 
twenty or more residues. Complete variation of twenty 
residues would generate 10 26 different proteins. By 
dividing the residues that lie close together in space 
into overlapping groups of five to seven residues, we 
can vary a large surface but never need to test more 
than 10 7 to 10 9 candidates at once, a savings of 10 19 to 
10 17 fold. The power of selection with accumulation of 
information is well illustrated in Chapter 3 of DAWK86. 

Use of NNT or NNG variegated codons leads to very 
efficient sampling of variegated libraries because the 
ratio of (different amino-acid sequences) / (different DNA 
sequences) is much closer to unity than it is for NNK or 
even the optimized vg codon (fxS) . Nevertheless, a few 
amino acids are omitted in each case. Both NNT and NNG 
allow members of all important classes of amino acids: 
hydrophobic, hydrophilic, acidic, basic, neutral 
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hydrophilic, small. and large. After selecting 
binding domain, a subsequent variegation and selection 
may be desirable to achieve a higher a££inity or 
specificity. During this seoond variegation, amino acrd 
possibilities overlooked by the preceding var.egat.on 

may be investigated. 

m the "first round- we assume that the parental, 
protein has no known affinity for the target materral . 
Por example, consider the parental mini-protein, srmxlar 
to that discussed in Example 11. having the structure X,- 
C a -X J -X.-X s -X,-C,-X s (SEQ ID N0 : 7) in which C 2 and C, form 
a disulfide bond. Introduction of extra cysteines may 
cause alternative structures to form which might be 
disadvantageous. Accidental cysteines at positions or 
5 are thought to be potentially more troublesome than at 
the other positions. We adopt the pattern of 

var iegation : Xi :NNT, ^mx. Xs : NNG , ^,«, and 

Xa : NNT , so that cysteine can not occur at positions 4 and 
5 (DNA sequence NNT . TGT . NNT . NNG . NNG . NNT . TGT . NNT has SEQ . 
ID NO:89) . (Table 131 shows the number of different 
amino acids expected in libraries prepared 
variegated in this way and comprising different numbers 
of independent transf ormants . ) 

m the second round of variegation, a preferred 
strategy is to vary each position through a new set of 
residues which includes the amino acid(s, which were 
£ ound at that position in the successful bindrng 
domains, and which include as many as possible of 
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residues which were excluded in the first round of 
variegation. 

A few examples may be helpful . Suppose we obtained 
PRO using NNT. This amino acid is available with either 
NNT or NNG. We can be reasonably sure that PRO is the 
best amino acid from the set [PRO, LEU, VAL, THR, ALA, 
ARG, GLY, PHE, TYR, CYS , HIS, ILE, ASN, ASP, SER] . Thus 
we need to try a set that includes [PRO, TRP, GLN, MET, 
LYS, GLU] . The set allowed by NNG is the preferred set. 

What if we obtained HIS instead? Histidine is 
aromatic and fairly hydrophobic and can form hydrogen 
bonds to and from the imidazole ring. Tryptophan is 
hydrophobic and aromatic and can donate a hydrogen to a 
suitable acceptor and was excluded by the NNT codon. 
Methionine was also excluded and is hydrophobic. Thus, 
one preferred course is to use the variegated codon HDS 
that allows [HIS, GLN, ASN, LYS , TYR, CYS, TRP, ARG, 
SER, GLY, <stop>] . 

GLN can be encoded by the NNG codon. If GLN is 
selected, at the next round we might use the vg codon 
VAS that encodes three of the seven excluded 
possibilities, viz. HIS, ASN, and ASP. The codon VAS 
encodes 6 amino acid sequences in six DNA sequences. 
This leaves PHE, CYS, TYR, and ILE untested, but these 
are all very hydrophobic. Switching to NNT would be 
undesirable because that would exclude GLN. One could 
use NAS that includes TYR and <stop>. Suppose the 
successful amino acid encoded by an NNG codon was ARG. 



Here we switch to NNT because this allows AKG plus all 
the excluded possibilities. 

THR is another possibility with the »T codon " 
THR iT selected, we switch to ■« because that includes 
L previously excluded PO~~»^J^~ ~ 
supp0 se the successful ^ ^ 

a cd We use RR= ac U1 
codon was ASP. we u afHfla r>lus LYS 

b ecause this includes both acidic a.mo acids plu 

a ion use VRS to allow GLJU . 
and ARG. One could also use 

Thus later rounds of variegation test bo 

• not previously mutated, and amino acid 

acid positions not previou y ition wh ich 

substitutions at a previously mutated P 

<■ within the previous substitution set. 
were not within uie y entirely 
If the first round of variegation is ent y 
, l a different pattern of variegation should 

» interaction set 

be used. For example, if -re ^ 
can be defined within a domain the res 
the next round of variegation should be from 
e than that probed in the initial variegation. X 
seated faiiures are encountered, one may switch 
different IPBD. 

IV DISPLAY STRATEGY: DISPLAYING FOREIGN BINDING 
DOMAINS ON THE SURFACE OE A "GENETIC PACKAGE" 

XV A^-erai^i-ine^^ 

^T^Tthe GP on which selection- 
It is emphasized that the Gt> 

■n bP practiced must be capable, alter 
through-binding will be pract suitab le 
the selection, either of growth m 
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environment or of in vitro amplification ^ 
t he encapsulated .enetic message, during at least pa t 
of the growth, the increase in numher » preferably 
approximately exponential with respect to «- 
component of a population that exhibits the desired 
binding properties m ay be .uite small, for ^« 
in 10 « or less. once this component of the population is 
separated fro. the non-binding components, it must be 

nfv it Culturing viable cells is the 
nnssible to amplify it. LU1LU a 

I!, powerful amplification of genetic 

and is preferred. cenetic messa.es can also be 

most preferred method. 

Preferred GPs are vegetative bacterial cells, 
bacterial spores and bacterial DNA viruses. EuKaryotic 
c ; s could be used as genetic parages but have longer 
hiding times and more strin 3 ent nutritional 
diviQiny much more 

than do bacteria and it is muui 
requirements tnan u.^ ^^t- 

aiLicult to produce a large number of 

Thev are also more fragile than bacterial 
transformants. They ,, fficult to chromatograph 

cells and therefore more difficult 

wit hout damage, —ryotic viruses could be used 
instead of bacteriophage but must be propagated in 
euxaryotic cells and therefore suffer from some of 
amplification problems mentioned above. 

Honetheless, a strain of any living cell or viru 
is potentially useful if the strain can be-. 1> 
genetically altered with reasonable facility to encode 
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o\ maintained and amplified in 
potential binding domain. 2) »~t 

— 3, ^t -it* - — 

protein n ; t ; c ;; aration , and ♦> 

material ™ * ££l 2r ning t he genetic information 
separated while 9 domaln in recoverable 

encoding the displayed binding a£ finity 
Preferably, the GP remain, -able after 



form. 

separation. 



ati ° n ' , -o « bacterial cell, or a 

Wh en the genetic package is a bacte 

which is assembled periplasmxcally , the 
phage whrch ^ component is a 

means has two ^ initial expression 

secretion signal whxch {a host cell 

j - ,„ rhe inner membrane of the ceil 
product to the in secre tion signal is 

when the package i. a phage, Thi Bed , 
cleaved off by a signal peptidas^ ^ ^J^^ 
mature, potential binding protein The ^ 

ls an outer surface transport signal which 
package to assemble tbe processed ^ ^ 

surface. Preferably. this 

signal is derived fro* a surface protein 

genetic package. e*Ci«t, the hybrid 

For example, in a pr bindi ng domain 

gene comprises a DNA encoding a potential binding 
operably linked to a signal se g uence ^, .J 
se q uences of the bacterial p** or b^ gen ^ 
3i gnal seance of M13 phage ^ ^ ^ ^ 
encoding a coat protein (e^, « 9 

VIII prot eins, of a filamentous phage (e^, 
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expression product is transported to the inner membrane 
(lipid bilayer) of the host cell, whereupon the signal 
peptide is cleaved off to leave a processed hybrid 
protein. The C- terminus of the coat protein-like 

component of this hybrid protein is trapped in the lipid 
bilayer, so that the hybrid protein does not escape into 
the periplasmic space. (This is typical of the wild- 
type coat protein.) As the single-stranded DNA of the 
nascent phage particle passes into the periplasmic 
space, it collects both wild- type coat protein and the 
hybrid protein from the lipid bilayer. The hybrid 
protein is thus packaged into the surface sheath of the 
filamentous phage, leaving the potential binding domain 
exposed on its outer surface. (Thus, the filamentous 
phage, not the host bacterial cell, is the "replicable 
genetic package" in this embodiment.) 

If a secretion signal is necessary for the display 
of the potential binding domain, in an especially 
preferred embodiment the bacterial cell in which the 
hybrid gene is expressed is of a "secretion-permissive" 
strain. 

When the genetic package is a bacterial spore, or a 
phage whose coat is assembled intracellularly , a 
secretion signal directing the expression product to the 
inner membrane of the host bacterial cell is 
unnecessary. In these cases, the display means is 
merely the outer surface transport signal, typically a 
derivative of a spore or phage coat protein. 
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There are several methods of arranging that the 
ipbd gene is expressed in such a manner that the IPBD is 
displayed on the outer surface of the GP. If one or 
more fusions of fragments of x genes to fragments of a 
natural osp gene are known to cause X protein domains to 
appear on the GP surface, then we pick the DNA sequence 
in which an ipbd gene fragment replaces the x gene 
fragment in one of the successful osp-x fusions as a 
preferred gene to be tested for the display-of -IPBD 
phenotype . (The gene may be constructed in any manner.) 
If no fusion data are available, then we fuse an ipbd 
fragment to various fragments, such as fragments that 
end at known or predicted domain boundaries, of the osp 
gene and obtain GPs that display the osp- ipbd fusion on 
the GP outer surface by screening or selection for the 
display-of -IPBD phenotype. The OSP may be modified so 
as to increase the flexibility and/or length of the 
linkage between the OSP and the IPBD and thereby reduce 
interference between the two. 

The fusion of ipbd and osp fragments may also 
include fragments of random or pseudorandom DNA to 
produce a population, members of which may display IPBD 
on the GP surface. The members displaying IPBD are 
isolated by screening or selection for the display-of- 
binding phenotype . 

The replicable genetic entity (phage or plasmid) 
that carries the osp-pbd genes (derived from the osp- 
ipbd gene) through the selection- through-binding 
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process, is referred to hereinafter as the operative 
cloning vector (OCV) . When the OCV is a phage, it may 
also serve as the genetic package. The choice of a GP 
is dependent in part on the availability of a suitable 
OCV and suitable OSP. 

Preferably, the GP is readily stored, for example, 
by freezing. If the GP is a cell, it should have a 
short doubling time, such as 20-4 0 minutes. If the GP 
is a virus, it should be prolific, e.g. , a burst size of 
at least 100/infected cell. GPs which are finicky or 
expensive to culture are disfavored. The GP should be 
easy to harvest, preferably by centrif ugation . The GP 
is preferably stable for a temperature range of -70 to 
42°C (stable at 4°C for several days or weeks); 
resistant to shear forces found in HPLC; insensitive to 
UV; tolerant of desiccation; and resistant to a pH of 
2.0 to 10.0, surface active agents such as SDS or 
Triton, chaotropes such as 4M urea or 2M guanidinium 
HCl , common ions such as K + , Na + , and S0 4 ", common 
organic solvents such as ether and acetone, and 
degradative enzymes. Finally, there must be a suitable 
OCV. 

Although knowledge of specific OSPs may not be 
required for vegetative bacterial cells and endospores, 
the user of the present invention, preferably, will 
know: Is the sequence of any osp known? (preferably 
yes, at least one required for phage) . How does the OSP 
arrive at the surface of GP? (knowledge of route 
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necessary, different routes have different uses, no 
route preferred per se ) . Is the OSP 

post- translationally processed? (no processing most 
preferred, predictable processing preferred over 
unpredictable processing) . What rules are known 

governing this processing, if there is any processing? 
(no processing most preferred, predictable processing 
acceptable) . What function does the OSP serve in the 
outer surface? (preferably not essential) . Is the 3D 
structure of an OSP known? (highly preferred) . Are 
fusions between fragments of osp and a fragment of x 
known? Does expression of these fusions lead to X 
appearing on the surface of the GP? (fusion data is as 
preferred as knowledge of a 3D structure) . Is a "2D" 
structure of an OSP available? (in this context, a "2D" 
structure indicates which residues are exposed on the 
cell surface) (2D structure less preferred than 3D 
structure) . Where are the domain boundaries in the OSP? 
(not as preferred as a 2D structure, but acceptable). 
Could IPBD go through the same process as OSP and fold 
correctly? (IPBD might need prosthetic groups) 
(preferably IPBD will fold after same process) . Is the 
sequence of an osp promoter known? (preferably yes) . 
Is osp gene controlled by regulatable promoter 
available? (preferably yes) . What activates this 

promoter? (preferably a diffusible chemical, such as 
IPTG) . How many different OSPs do we know? (the more 
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the better) . How many copies of each OSP are present on 
each package? (more is better) . 

The user will want knowledge of the physical 
attributes of the GP: How large is the GP? (knowledge 
useful in deciding how to isolate GPs) (preferably easy 
to separate from soluble proteins such as IgGs) . What 
is the charge on the GP? (neutral preferred) . What is 
the sedimentation rate of the GP? (knowledge preferred, 
no particular value preferred) . 

The preferred GP, OCV and OSP are those for which 
the fewest serious obstacles can be seen, rather than 
the one that scores highest on any one criterion. 

Viruses are preferred over bacterial cells and 
spores (cp. LUIT85 and references cited therein). The 
virus is preferably a DNA virus with a genome size of 2 
kb to 10 kb base pairs, such as (but not limited to) the 
filamentous (Ff) phage M13, fd, and fl ( inter alia see 
RASC86, BOEK8 0, BOEK82 , DAYL8 8, GRAY81b, KUHN8 8 , LOPE85, 
WEBS85, MARV75, MARV80, MOSE82 , CRIS84 , SMIT88a, 
SMIT88b) ; the IncN specific phage Ike and Ifl (NAKA81, 
PEET85, PEET87, THOM83 , THOM88a) ; IncP-specif ic 
Pseudomonas aeruginosa phage Pfl (THOM83, THOM88a) and 
Pf3 (LUIT83, LUIT85, LUTI87, THOM8 8a) ; and the 
Xanthomonas oryzae phage Xf (THOM83, THOM8 8a) . 
Filamentous phage are especially preferred. 

Preferred OSPs for several GPs are given in Table 
2. References to osp- ipbd fusions in this section 
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should be taken to apply, mutatis mutandis , to osp-pbd 
and osp-sbd fusions as well. 

The species chosen as a GP should have a well- 
characterized genetic system and strains defective in 
genetic recombination should be available. The chosen 
strain may need to be manipulated to prevent changes of 
its physiological state that would alter the number or 
type of proteins or other molecules on the cell surface 
during the affinity separation procedure. 
IV. B . Phages for Use as GPs : 

Unlike bacterial cells and spores, choice of a 
phage depends strongly on knowledge of the 3D structure 
of an OSP and how it interacts with other proteins in 
the capsid. This does not mean that we need atomic 
resolution of the OSP, but that we need to know which 
segments of the OSP interact to make the viral coat and 
which segments are not constrained by structural or 
functional roles. The size of the phage genome and the 
packaging mechanism are also important because the phage 
genome itself is the cloning vector. The osp-ipbd gene 
is inserted into the phage genome; therefore: 1) the 
genome of the phage must allow introduction of the osp- 
ipbd gene either by tolerating additional genetic 
material or by having replaceable genetic material; 2) 
the virion must be capable of packaging the genome after 
accepting the insertion or substitution of genetic 
material, and 3) the display of the OSP-IPBD protein on 
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the phage surface must not disrupt virion structure 
sufficiently to interfere with phage propagation. 

The morphogenetic pathway of the phage determines 
the environment in which the IPBD will have opportunity 
to fold. Periplasmically assembled phage are preferred 
when IPBDs contain essential disulfides, as such IPBDs 
may not fold within a cell (these proteins may fold 
after the phage is released from the cell) . 
Intracellularly assembled phage are preferred when the 
IPBD needs large or insoluble prosthetic groups (such as 
Fe 4 S 4 clusters) , since the IPBD may not fold if secreted 
because the prosthetic group is lacking. 

When variegation is introduced in Part II, multiple 
infections could generate hybrid GPs that carry the gene 
for one PBD but have at least some copies of a different 
PBD on their surfaces; it is preferable to minimize this 
possibility by infecting cells with phage under 
conditions resulting in a low multiple-of -infection 
(MOI) - 

Bacteriophages are excellent candidates for GPs 
because there is little or no enzymatic activity 
associated with intact mature phage, and because the 
genes are inactive outside a bacterial host, rendering 
the mature phage particles metabolically inert. 

The filamentous phages ( e.g. , M13) are of 
particular interest . 

For a given bacteriophage, the preferred OSP is 
usually one that is present on the phage surface in the 
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largest number of copies, as this allows the greatest 
flexibility in varying the ratio of OSP-IPBD to wild 
type OSP and also gives the highest likelihood of 
obtaining satisfactory affinity separation. Moreover, a 
protein present in only one or a few copies usually 
performs an essential function in morphogenesis or 
infection; mutating such a protein by addition or 
insertion is likely to result in reduction in viability 
of the GP. Nevertheless, an OSP such as M13 gill 
protein may be an excellent choice as OSP to cause 
display of the PBD. 

It is preferred that the wild-type osp gene be 
preserved. The ipbd gene fragment may be inserted 
either into a second copy of the recipient osp gene or 
into a novel engineered osp gene. It is preferred that 
the osp- ipbd gene be placed under control of a regulated 
promoter. Our process forces the evolution of the PBDs 
derived from IPBD so that some of them develop a novel 
function, viz . binding to a chosen target. Placing the 
gene that is subject to evolution on a duplicate gene is 
an imitation of the widely-accepted scenario for the 
evolution of protein families. It is now generally 
accepted that gene duplication is the first step in the 
evolution of a protein family from an ancestral protein. 
By having two copies of a gene, the affected 
physiological process can tolerate mutations in one of 
the genes. This process is well understood and 
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documented for the globin family ( cf . DICK83, p65ff, and 
CREI84, pll7- 125) . 

The user must choose a site in the candidate OSP 
gene for inserting a ipbd gene fragment. The coats of 
most bacteriophage are highly ordered. Filamentous 
phage can be described by a helical lattice; isometric 
phage, by an icosahedral lattice. Each monomer of each 
major coat protein sits on a lattice point and makes 
defined interactions with each of its neighbors. 
Proteins that fit into the lattice by making some, but 
not all, of the normal lattice contacts are likely to 
destabilize the virion by: a) aborting formation of the 
virion, b) making the virion unstable, or c) leaving 
gaps in the virion so that the nucleic acid is not 
protected. Thus in bacteriophage, unlike the cases of 
bacteria and spores, it is important to retain in 
engineered OSP- IPBD fusion proteins those residues of 
the parental OSP that interact with other proteins in 
the virion. For M13 gVIII, we retain the entire mature 
protein, while for M13 gill, it might suffice to retain 
the last 100 residues (or even fewer) . Such a truncated 
gill protein would be expressed in parallel with the 
complete gill protein, as gill protein is required for 
phage inf ectivity . 

Il'ichev et al . (ILIC89) have reported viable phage 
having alterations in gene VIII . In one case, a point 
mutation changed one amino acid near the amino terminus 
of the mature gVIII protein from GLU to ASP. In the 
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other case, five amino acids were inserted at the site 
of the first mutation. They suggested that similar 
constructions could be used for vaccines. They did not 
report on any binding properties of the modified phage, 
nor did they suggest mutagenizing the inserted material. 
Furthermore, they did not insert a binding domain, nor 
did they suggest inserting such a domain. 

Further considerations on the design of the 
ipbd : : osp gene is discussed in section IV. F. 
Filamentous phage : 

Compared to other bacteriophage, filamentous phage 
in general are attractive and M13 in particular is 
especially attractive because: 1) the 3D structure of 
the virion is known; 2) the processing of the coat 
protein is well understood; 3) the genome is expandable; 
4) the genome is small; 5) the sequence of the genome is 
known; 6) the virion is physically resistant to shear, 
heat, cold, urea, guanidinium CI, low pH, and high salt; 
7) the phage is a sequencing vector so that sequencing 
is especially easy; 8) antibiotic-resistance genes have 
been cloned into the genome with predictable results 
(HINE80) ; 9) It is easily cultured and stored (FRIT85) , 
with no unusual or expensive media requirements for the 
infected cells, 10) it has a high burst size, each 
infected cell yielding 100 to 1000 M13 progeny after 
infection; and 11) it is easily harvested and 
concentrated (SALI64, FRIT85) . 
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The filamentous phage include M13. fl, ^ * fl ' 

tVo xf Pfl, and Pf 3 . 

T L entire life cycle of *e filamentous phage «13 

. common cloning and sequencing 

understood. M» an. « are so closely related that »e 

• „ ^ pach relevant to Doun 
consider the properties of each historical 
(RAS C86); any differentiation is for 
accuracy. The genetic structure (the complete sequence 
Ic^L the identity and function of ^ ^ 

f franscriDtion and location of the 
and the order of transcript ioi 

, rtf M13 is well known as is the physical 
ftrr/e Of the Virion „X. BOF.K80. ITOK79 , 

r;,S, KUHK87, M.KOS0. 

~ ™ .^r;™:;: r^iui 

RASC86 for a recent review of the stru 

of the coat proteins. Because the genome is small ,6«3 
bp) , cassette mutagenesis is practical on KF Ml 

,» is single-stranded oligo-nt directed 
(AUSU87) , as is sing Dlasmid and 

(FRIT85) Ml 3 is a plasmio. 

mutagenesis (FRIT85) 3equ encing 
transformation system in itself, and an 

vector. M13 can be grown on Rec~ strains of L =2^ 
Th e M13 genome is expandable «MK SS 7B. FRIT8S, and KL3 
d oes not lyse cells. Because the M13 genome is extruded 
"rough the membrane and coated by a large number of 
identical protein molecules, it can be used as a c lonin 
vector (WATS87 P 278, and MESS77) Thus we can insert 
extra genes into M13 and they will be carried along in 
stable manner. 
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Marvin and collaborators (MARV78, MAKO80, BANN81) 
have determined an approximate 3D virion structure of fl 
by a combination of genetics, biochemistry, and X-ray 
diffraction from fibers of the virus. Figure 4 is drawn 
after the model of Banner et al . (BANN81) and shows only 
the C a s of the protein. The apparent holes in the 
cylindrical sheath are actually filled by protein side 
groups so that the DNA within is protected. The amino 
terminus of each protein monomer is to the outside of 
the cylinder, while the carboxy terminus is at smaller 
radius, near the DNA. Although other filamentous phages 
(e.g. Pfl or Ike) have different helical symmetry, all 
have coats composed of many short a- helical monomers 
with the amino terminus of each monomer on the virion 
surface . 

The major coat protein is encoded by gene VIII. 
The 50 amino acid mature gene VIII coat protein is 
synthesized as a 73 amino acid precoat (ITOK79) . The 
first 23 amino acids constitute a typical signal - 
sequence which causes the nascent polypeptide to be 
inserted into the inner cell membrane. Whether the 
precoat inserts into the membrane by itself or through 
the action of host secretion components, such as SecA 
and SecY, remains controversial, but has no effect on 
the operation of the present invention. 

An coli signal peptidase (SP-I) recognizes amino 
acids 18, 21, and 23, and, to a lesser extent, residue 
22, and cuts between residues 23 and 24 of the precoat 
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(KUHN85a, KUHN85b, OLIV87) . After removal of the signal 
sequence, the amino terminus of the mature coat is 
located on the periplasmic side of the inner membrane; 
the carboxy terminus is on the cytoplasmic side. About 
3000 copies of the mature 50 amino acid coat protein 
associate side-by-side in the inner membrane. 

The sequence of gene VIII is known, and the amino 
acid sequence can be encoded on a synthetic gene, using 
lacUVS promoter and used in conjunction with the Lacl q 
repressor. The lacUVS promoter is induced by IPTG. 
Mature gene VIII protein makes up the sheath around the 
circular ssDNA. The 3D structure of fl virion is known 
at medium resolution; the amino terminus of gene VIII 
protein is on surface of the virion. A few 

modifications of gene VIII have been made and are 
discussed below. The 2D structure of M13 coat protein 
is implicit in the 3D structure. Mature M13 gene VIII 
protein has only one domain. 

When the GP is M13 the gene III and the gene VIII 
proteins are highly preferred as OSP (see Examples I 
through IV) . The proteins from genes VI, VII, and IX 
may also be used. 

As discussed in the Examples, we have constructed a 
tripartite gene comprising: 

1) DNA encoding a signal sequence directing secretion 
of parts (2) and (3) through the inner membrane, 

2) DNA encoding the mature BPTI sequence, and 

3) DNA encoding the mature M13 gVIII protein. 
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This gene causes BPTI to appear in active form on the 
surface of M13 phage. 

The gene VIII protein is a preferred OSP because it 
is present in many copies and because its location and 
orientation in the virion are known (BANN81) . 
Preferably, the PBD is attached to the amino terminus of 
the mature M13 coat protein. Had direct fusion of PBD 
to M13 CP failed to cause PBD to be displayed on the 
surface of M13, we would have varied part of the mini- 
protein sequence and/or insert short random or nonrandom 
spacer sequences between mini -protein and M13 CP. The 
3D model of f 1 indicates strongly that fusing IPBD to 
the amino terminus of M13 CP is more likely to yield a 
functional chimeric protein than any other fusion site. 

Similar constructions could be made with other 
filamentous phage. Pf3 is a well known filamentous 
phage that infects Pseudomonas aerugenosa cells that 
harbor an IncP-1 plasmid. The entire genome has been 
sequenced (LUIT85) and the genetic signals involved in 
replication and assembly are known (LUIT87) . The major 
coat protein of PF3 is unusual in having no signal 
peptide to direct its secretion. The sequence has 
charged residues ASP 7 , ARG 3 7 , LYS 40 , and PHE 44 -COO" which 
is consistent with the amino terminus being exposed. 
Thus, to cause an IPBD to appear on the surface of Pf3, 
we construct a tripartite gene comprising: 
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1) a signal sequence known to cause secretion in P . 
aerugenosa (preferably known to cause secretion of 
IPBD) fused in- frame to , 

2) a gene fragment encoding the IPBD sequence, fused 
in- frame to, 

3) DNA encoding the mature Pf3 coat protein. 
Optionally, DNA encoding a flexible linker of one 

to 10 amino acids is introduced between the ipbd gene 
fragment and the Pf3 coat -protein gene . Optionally, DNA 
encoding the recognition site for a specific protease, 
such as tissue plasminogen activator or blood clotting 
Factor Xa, is introduced between the ipbd gene fragment 
and the Pf3 coat-protein gene. Amino acids that form 
the recognition site for a specific protease may also 
serve the function of a flexible linker. This 
tripartite gene is introduced into Pf3 so that it does 
not interfere with expression of any Pf3 genes. To 
reduce the possibility of genetic recombination, part 
(3) is designed to have numerous silent mutations 
relative to the wild-type gene. Once the signal 

sequence is cleaved off, the IPBD is in the periplasm 
and the mature coat protein acts as an anchor and phage- 
assembly signal. It matters not that this fusion 

protein comes to rest anchored in the lipid bilayer by a 
route different from the route followed by the wild-type 
coat protein. 

The amino-acid sequence of M13 pre-coat (SCHA78) , 
called AA_seql, is 
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AA_seql 

1 1 2 I |2 3 3 4 4 5 

5 0 5 0 <>5 0 5 0 5 0 

MKKSLVLKASVAVATLVPMLSFAAEGDDPAKAAFNSLQASATEYIGYAWA 

5 6 6 7 7 

5 0 5 0 3 

MVWIVGATIGIKLFKKFTSKAS 

The single-letter codes for amino acids and the codes 
for ambiguous DNA are given in Table 1 . The best site 
for inserting a novel protein domain into M13 CP is 
after A23 because SP-I cleaves the precoat protein after 
A23, as indicated by the arrow. Proteins that can be 
secreted will appear connected to mature M13 CP at its 
amino terminus . Because the amino terminus of mature 
M13 CP is located on the outer surface of the virion, 
the introduced domain will be displayed on the outside 
of the virion. The uncertainty of the mechanism by 
which M13CP appears in the lipid bilayer raises the 
possibility that direct insertion of bpti into gene VIII 
may not yield a functional fusion protein. It may be 
necessary to change the signal sequence of the fusion 
to, for example, the phoA signal sequence 

(MKQSTIALALLPLLFTPVTKA J^^^-WarEs et" al . (MARK86) 

showed that the phoA signal peptide could direct mature 
BPTI to the coli periplasm. 

Another vehicle for displaying the IPBD is by 
expressing it as a domain of a chimeric gene containing 
part or all of gene III . This gene encodes one of the 
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minor coat proteins of M13 . Genes VI, VII, and IX also 
encode minor coat proteins. Each of these minor 

proteins is present in about 5 copies per virion and is 
related to morphogenesis or infection. In contrast, the 
major coat protein is present in more than 2500 copies 
per virion. The gene VI, VII, and IX proteins are 
present at the ends of the virion; these three proteins 
are not post - translat ionally processed (RASC86) . 

The single-stranded circular phage DNA associates 
with about five copies of the gene III protein and is 
then extruded through the patch of membrane-associated 
coat protein in such a way that the DNA is encased in a 
helical sheath of protein (WEBS78) . The DNA does not 
base pair (that would impose severe restrictions on the 
virus genome) ; rather the bases intercalate with each 
other independent of sequence . 

Smith (SMIT85) and de la Cruz et al . (DELA88) have 
shown that insertions into gene III cause novel protein 
domains to appear on the virion outer surface. The 
mini -protein ' s gene may be fused to gene III at the site 
used by Smith and by de la Cruz et al . , at a codon 
corresponding to another domain boundary or to a surface 
loop of the protein, or to the amino terminus of the 
mature protein. 

All published works use a vector containing a 
single modified gene III of fd. Thus, all five copies 
of gill are identically modified. Gene III is quite 
large (12 72 b.p. or about 2 0% of the phage genome) and 
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it is uncertain whether a duplicate of the whole gene 
can be stably inserted into the phage. Furthermore, all 
five copies of gill protein are at one end of the 
virion. When bivalent target molecules (such as 

antibodies) bind a pentavalent phage, the resulting 
complex may be irreversible. Irreversible binding of 
the GP to the target greatly interferes with affinity 
enrichment of the GPs that carry the genetic sequences 
encoding the novel polypeptide having the highest 
affinity for the target. 

To reduce the likelihood of formation of 
irreversible complexes, we may use a second, synthetic 
gene that encodes carboxy- terminal parts of III . We 
might, for example, engineer a gene that consists of 
(from 5 ■ to 3 ' ) : 

1) a promoter (preferably regulated) , 

2) a ribosome-binding site, 

3) an initiation codon, 

4) a functional signal peptide directing secretion of 
parts (5) and (6) through the inner membrane, 

5) DNA encoding an IPBD, 

6) DNA encoding residues 275 through 424 of M13 gill 
protein, 

7) a translation stop codon, and 

8) (optionally) a transcription stop signal. 

We leave the wild-type gene III so that some unaltered 
gene III protein will be present. Alternatively, we- may 
use gene VIII protein as the OSP and regulate the 



142 



osp : : ipbd fusion so that only one or a few copies of the 
fusion protein appear on the phage. 

M13 gene VI, VII, and IX proteins are not processed 
after translation. The route by which these proteins 
are assembled into the phage have not been reported. 
These proteins are necessary for normal morphogenesis 
and infectivity of the phage. Whether these molecules 
(gene VI protein, gene VII protein, and gene IX protein) 
attach themselves to the phage: a) from the cytoplasm, 
b) from the periplasm, or c) from within the lipid 
bilayer, is not known. One could use any of these 
proteins to introduce an IPBD onto the phage surface by 
one of the constructions: 

1) ipbd : : pmcp , 

2) pmcp : : ipbd , 

3) signal : : ipbd : : pmcp , and 

4) signal : : pmcp : : ipbd . 

where ipbd represents DNA coding on expression for the 
initial potential binding domain; pmcp represents DNA 
coding for one of the phage minor coat proteins, VI, 
VII, and IX; signal represents a functional secretion 
signal peptide, such as the phoA signal 
(MKQSTIALALLPLLFTPVTKAl ; and " : : » represents in- frame 
genetic fusion. The indicated fusions are placed 

downstream of a known promoter, preferably a regulated 
promoter such as lacUVS , tac , or trp . Fusions (1) and 
(2) are appropriate when the minor coat protein attaches 
to the phage from the cytoplasm or by autonomous 
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insertion into the lipid bilayer. Fusion (1) is 

appropriate if the amino terminus of the minor coat 
protein is free and (2) is appropriate if the carboxy 
terminus is free. Fusions (3) and (4) are appropriate 
if the minor coat protein attaches to the phage from the 
periplasm or from within the lipid bilayer. Fusion (3) 
is appropriate if the amino terminus of the minor coat 
protein is free and (4) is appropriate if the carboxy 
terminus is free. 
Bacteriophage <£X174 : 

The bacteriophage 3>X174 is a very small icosahedral 
virus which has been thoroughly studied by genetics, 
biochemistry, and electron microscopy (See The Single - 
Stranded DNA Phages (DENH78) ) . To date, no proteins 
from 3>X174 have been studied by X-ray diffraction. 
<i>X174 is not used as a cloning vector because <£>X174 can 
accept very little additional DNA; the virus is so 
tightly constrained that several of its genes overlap. 
Chambers et al . (CHAM82) showed that mutants in gene G 
are rescued by the wild-type G gene carried on a plasmid 
so that the host supplies this protein . 

Three gene products of 3>X174 are present on the 
outside of the mature virion: F (capsid) , G (major spike 
protein, 60 copies per virion) , and H (minor spike 
protein, 12 copies per virion) . The G protein comprises 
175 amino acids, while H comprises 328 amino acids. The 
F protein interacts with the single- stranded DNA of the 
virus. The proteins F, G, and H are translated from a 
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single mRNA in the viral infected cells. If the G 
protein is supplied from a plasmid in the host, then the 
viral g gene is no longer essential. We introduce one 
or more stop codons into g so that no G is produced from 
the viral gene. We fuse a pbd gene fragment to h, 
either at the 3'- or 5' terminus. We eliminate an amount 
of the viral g gene equal to the size of pbd so that the 
size of the genome is unchanged. 
Large DNA Phages 

Phage such as □ or T4 have much larger genomes than 
do M13 or 3>X174. Large genomes are less conveniently 
manipulated than small genomes. Phage □ has such a 
large genome that cassette mutagenesis is not 
practicable. One can not use annealing of a mutagenic 
oligonucleotide either, because there is no ready supply 
of single-stranded □ DNA. (X DNA is packaged as double- 
stranded DNA. ) Phage such as □ and T4 have more 
complicated 3D capsid structures than M13 or <£X174, with 
more OSPs to choose from. Intracellular morphogenesis 
of phage □ could cause protein domains that contain 
disulfide bonds in their folded forms not to fold. 

Phage □ virions and phage T4 virions form 
intracellularly , so that IPBDs requiring large or 
insoluble prosthetic groups might fold on the surfaces 
of these phage. 
RNA Phages 

RNA phage are not preferred because manipulation of 
RNA is much less convenient than is the manipulation of 
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DNA. If the RNA phage MS2 were modified to make room 
for an osp-ipbd gene and if a message containing the A 
protein binding site and the gene for a chimera of coat 
protein and a PBD were produced in a cell that also 
contained A protein and wild-type coat protein (both 
produced from regulated genes on a plasmid) , then the 
RNA coding for the chimeric protein would get packaged. 
A package comprising RNA encapsulated by proteins 
encoded by that RNA satisfies the major criterion that 
the genetic message inside the package specifies 
something on the outside. The particles by themselves 
are not viable unless the modified A protein is 
functional. After isolating the packages that carry an 
SBD, we would need to: 1) separate the RNA from the 
protein capsid; 2) reverse transcribe the RNA into DNA, 
using AMV or MMTV reverse transcriptase, and 3) use 
Thermus aquaticus DNA polymerase for 2 5 or more cycles 
of Polymerase Chain Reaction™ to amplify the osp-sbd 
DNA until there is enough to subclone the recovered 
genetic message into a plasmid for sequencing and 
further work. 

Alternatively, helper phage could be used to rescue 
the isolated phage. In one of these ways we can recover 
a sequence that codes for an SBD having desirable 
binding properties. 

IV. C. Bacterial Cells as Genetic Packages: 

One may choose any well -characterized bacterial 
strain which (1) may be grown in culture (2) may be 
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engineered to display PBDs on its surface, and (3) is 
compatible with affinity .selection. 

Among bacterial cells, the preferred genetic 
packages are Salmonella typhimurium, Bacillus subtil is , 
Pseudomonas aeruginosa, Vibrio cholerae , Klebsiella 
pneumonia, Neisseria gonorrhoeae , Neisseria 

meningitidis , Bacteroides nodosus , Moraxella bovis, and 
especially Escherichia coli . The potential binding 
mini -protein may be expressed as an insert in a chimeric 
bacterial outer surface protein (OSP) . All bacteria 
exhibit proteins on their outer surfaces. Works on the 
localization of OSPs and the methods of determining 
their structure include: CALA90, HEIJ90, EHRM90, 
BENZ88a, BENZ88b, MAN088, BAKE87, RAND87, HANC87, 
HENR87, NAKA86b, MANO86, SILH85, TOMM85, NIKA84, LUGT83, 
and BECK83 . 

In E^ coli , LamB is a preferred OSP. As discussed 
below, there are a number of very good alternatives in 
E . coli and there are very good alternatives in other 
bacterial species. There are also methods for 

determining the topology of OSPs so that it is possible 
to systematically determine where to insert an ipbd into 
an osp gene to obtain display of an IPBD on the surface 
of any bacterial species. 

In view of the extensive knowledge of E_;_ coli , a 
strain of E_^ coli , defective in recombination, is the 
strongest candidate as a bacterial GP. 
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Oliver has reviewed mechanisms of protein secretion 
in bacteria (OLIV85a and OLIV87) . Nikaido and Vaara 

(NIKA87) , Benz (BENZ88b) , and Baker et al . (BAKE87) have 
reviewed mechanisms by which proteins become localized 
to the outer membrane of gram-negative bacteria. While 
most bacterial proteins remain in the cytoplasm, others 
are transported to the periplasmic space (which lies 
between the plasma membrane and the cell wall of gram- 
negative bacteria) , or are conveyed and anchored to the 
outer surface of the cell. Still others are exported 

(secreted) into the medium surrounding the cell. Those 
characteristics of a protein that are recognized by a 
cell and that cause it to be transported out of the 
cytoplasm and displayed on the cell surface will be 
termed "outer-surface transport signals" . 

Gram-negative bacteria have outer -membrane proteins 

(OMP) , that form a subset of OSPs . Many OMPs span the 
membrane one or more times. The signals that cause OMPs 
to localize in the outer membrane are encoded in the 
amino acid sequence of the mature protein . Outer 
membrane proteins of bacteria are initially expressed in 
a precursor form including a so- called signal peptide. 
The precursor protein is transported to the inner 
membrane, and the signal peptide moiety is extruded into 
the periplasmic space. There, it is cleaved off by a 
"signal peptidase", and the remaining "mature" protein 
can now enter the periplasm. Once there, other cellular 
mechanisms recognize structures in the mature protein 
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which indicate that its proper place is on the outer 
membrane, and transport it to that location. 

It is well known that the DNA coding for the leader 
or signal peptide from one protein may be attached to 
the DNA sequence coding for another protein, protein X, 
to form a chimeric gene whose expression causes protein 
X to appear free in the periplasm (BECK83, INOU86 ChlO, 
LEEC8 6, MARK8 6, and BOQU8 7) . That is, the leader causes 
the chimeric protein to be secreted through the lipid 
bilayer; once in the periplasm, it is cleaved off by the 
signal peptidase SP-I. 

The use of export -permissive bacterial strains 

(LISS85, STAD89) increases the probability that a 
signal-sequence- fusion will direct the desired protein 
to the cell surface. Liss et. al . (LISS85) showed that 
the mutation prlA4 makes coli more permissive with 

respect to signal sequences. Similarly, Stader et al . 

(STAD89) found a strain that bears a prlG mutation and 
that permits export of a protein that is blocked from 
export in wild-type cells. Such export -permissive 

strains are preferred . 

OSP-IPBD fusion proteins need not fill a structural 
role in the outer membranes of Gram-negative bacteria 
because parts of the outer membranes are not highly 
ordered. For large OSPs there is likely to be one or 
more sites at which osp can be truncated and fused to 
ipbd such that cells expressing the fusion will display 
IPBDs on the cell surface. Fusions of fragments of omp 
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genes with fragments of an x gene have led to X 
appearing on the outer membrane (CHAR88b, BENS84, 
CLEM81) . When such fusions have been made, we can 
design an osp-ipbd gene by substituting ipbd for x in 
the DNA sequence. Otherwise, a successful OMP-IPBD 
fusion is preferably sought by fusing fragments of the 
best omp to an ipbd , expressing the fused gene, and 
testing the resultant GPs for display-of -IPBD phenotype . 
We use the available data about the OMP to pick the 
point or points of fusion between omp and ipbd to 
maximize the likelihood that IPBD will be displayed. 
(Spacer DNA encoding flexible linkers, made, e.g. , of 
GLY, SER, and ASN, may be placed between the osp - and 
ipbd - derived fragments to facilitate display.) 
Alternatively, we truncate osp at several sites or in a 
manner that produces osp fragments of variable length 
and fuse the osp fragments to ipbd ; cells expressing the 
fusion are screened or selected which display IPBDs on 
the cell surface. Freudl et al . (FREU89) have shown 
that fragments of OSPs (such as OmpA) above a certain 
size are incorporated into the outer membrane. An 
additional alternative is to include short segments of 
random DNA in the fusion of omp fragments to ipbd and 
then screen or select the resulting variegated 
population for members exhibiting the display-of -IPBD 
phenotype . 

In E^ coli , the LamB protein is a well understood 
OSP and can be used (BENS 8 4 , CHAR90, RONC90, VAND9 0, 
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CHAP90, MOLL90, CHAR88b, CHAR88C , CLEM81, DARG88, 
FERE82a, FERE82b / FERE83 , FERE84 , FERE8 6a , FERE86b, 
FERE8 9a, FERE89b, GEHR87 , HALL82 , NAKA86a, STAD8 6 , 
HEIN88, BENS87b, BENS87C , BOUG84 , BOUL86a, CHAR84) 
The E_;_ coli LamB has been expressed in functional form 
in typhimurium (DEVR84, BARB 8 5 , HARK8 7) , cholerae 

(HARK86) , and pneumonia (DEVR84 , WEHM8 9) , so that one 
could display a population of PBDs in any of these 
species as a fusion to E^ coli LamB. pneumonia 
expresses a maltoporin similar to LamB (WEHM89) which 
could also be used. In aeruginosa , the Dl protein (a 
homologue of LamB) can be used (TRIA8 8) . 

LamB of E^ coli is a porin for maltose and malto 
dextrin transport, and serves as the receptor for 
adsorption of bacteriophages □ and K10. LamB is 

transported to the outer membrane if a functional N- 
terminal sequence is present; further , the first 49 
amino acids of the mature sequence are required for 
successful transport (BENS 8 4 ) . As with other OSPs, LamB 
of E^ coli is synthesized with a typical signal- 
sequence which is subsequently removed. Homology 
between parts of LamB protein and other outer membrane 
proteins OmpC, OmpF, and PhoE has been detected 

(NIKA84) , including homology between LamB amino acids 
39-49 and sequences of the other proteins. These 
subsequences may label the proteins for transport to the 
outer membrane . 
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The amino acid sequence of LamB is known (CLEM81) , 
and a model has been developed of how it anchors itself 
to the outer membrane (Reviewed by, among others, 
BENZ88b) . The location of its maltose and phage binding 
domains are also known (HEIN88) . Using this 

information, one may identify several strategies by 
which a PBD insert may be incorporated into LamB to 
provide a chimeric OSP which displays the PBD on the 
bacterial outer membrane . 

When the PBDs are to be displayed by a chimeric 
transmembrane protein like LamB, the PBD could be 
inserted into a loop normally found on the surface of 
the cell ( cp . BECK83, MANO86) . Alternatively, we may 
fuse a 5 1 segment of the osp gene to the ipbd gene 
fragment; the point of fusion is picked to correspond to 
a surface-exposed loop of the OSP and the carboxy 
terminal portions of the OSP are omitted. In LamB, it 
has been found that up to 6 0 amino acids may be inserted 
(CHAR88b) with display of the foreign epitope resulting; 
the structural features of OmpC, OmpA, OmpF, and PhoE 
are so similar that one expects similar behavior from 
these proteins. 

It should be noted that while LamB may be 
characterized as a binding protein, it is used in the 
present invention to provide an OSTS; its binding 
domains are not variegated. 

Other bacterial outer surface proteins, such as 
OmpA, OmpC, OmpF, PhoE, and pilin, may be used in place 
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of LamB and its homologues . OmpA is of particular 
interest because it is very abundant and because 
homologues are known in a wide variety of gram-negative 
bacterial species. Baker et al . (BAKE87) review 

assembly of proteins into the outer membrane of coli 
and cite a topological model of OmpA (VOGE86) that 
predicts that residues 19-32, 62-73, 105-118, and 147- 
158 are exposed on the cell surface. Insertion of a 
ipbd encoding fragment at about codon 111 or at about 
codon 152 is likely to cause the IPBD to be displayed on 
the cell surface. Concerning OmpA, see also MACI88 and 
MAN088. Porin Protein F of Pseudomonas aeruginosa has 
been cloned and has sequence homology to OmpA of coli 
(DUCH88) . Although this homology is not sufficient to 
allow prediction of surface-exposed residues on Porin 
Protein F, the methods used to determine the topological 
model of OmpA may be applied to Porin Protein F. Works 
related to use of OmpA as an OSP include BECK8 0 and 
MACI88 . 

Misra and Benson (MISR88a, MISR88b) disclose a 
topological model of coli OmpC that predicts that, 

among others, residues GLYi 64 and LEU 250 are exposed on 
the cell surface. Thus insertion of an ipbd gene 
fragment at about codon 164 or at about codon 2 50 of the 
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E . coli ompC gene or at corresponding codons of the 
S . typhimurium ompC gene is likely to cause IPBD to 
appear on the cell surface. The ompC genes of other 
bacterial species may be used. Other works related to 
OmpC include CATR87 and CLIC88. 

OmpF of E_^ coli is a very abundant OSP, slO 4 
copies/cell. Pages et al . (PAGE90) have published a 
model of OmpF indicating seven surface-exposed segments. 
Fusion of an ipbd gene fragment, either as an insert or 
to replace the 3 1 part of ompF , in one of the indicated 
regions is likely to produce a functional ompF : : ipbd 
gene the expression of which leads to display of IPBD on 
the cell surface. In particular, fusion at about codon 
111, 177, 217, or 245 should lead to a functional 
ompF : : ipbd gene. Concerning OmpF, see also REID88b, 
PAGE 8 8 , BENS 8 8 , TOMM82, and SODE85 . 

Pilus proteins are of particular interest because 
piliated cells express many copies of these proteins and 
because several species (N^_ gonorrhoeae , P. aeruginosa , 
Moraxella bovis, Bacteroides nodosus , and E_^ coli ) 
express related pilins. Getzoff and coworkers (GETZ88, 
PARG87, SOME85) have constructed a model of the 
gonococcal pilus that predicts that the protein forms a 
four-helix bundle having structural similarities to 
tobacco mosaic virus protein and myohemerythrin. On 
this model, both the amino and carboxy termini of the 
protein are exposed. The amino terminus is methylated. 
Elleman (EL.LE88) has reviewed pilins of Bacteroides 
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nodosus and other species and serotype differences can 
be related to differences in the pilin protein and that 
most variation occurs in the C- terminal region. The 
amino- terminal portions of the pilin protein are highly 
conserved. Jennings et al . (JENN8 9) have grafted a 
fragment of foot-and-mouth disease virus (residues 144- 
159) into the nodosus type 4 fimbrial protein which 

is highly homologous to gonococcal pilin. They found 
that expression of the 3' -terminal fusion in P . 
aeruginosa led to a viable strain that makes detectable 
amounts of the fusion protein. Jennings et al . did not 
vary the foreign epitope nor did they suggest any 
variation. They inserted a GLY-GLY linker between the 
last pilin residue and the first residue of the foreign 
epitope to provide a "flexible linker". Thus a 

preferred place to attach an IPBD is the carboxy 
terminus. The exposed loops of the bundle could also be 
used, although the particular internal fusions tested by 
Jennings et al . (JENN89) appeared to be lethal in P. 
aeruginosa . Concerning pilin, see also MCKE85 and 
ORND85 . 

Judd (JUDD86, JUDD85) has investigated Protein IA 
of gonorrhoeae and found that the amino terminus is 

exposed; thus, one could attach an IPBD at or near the 
amino terminus of the mature P.IA as a means to display 
the IPBD on the gonorrhoeae surface. 

A model of the topology of PhoE of E^ coli has been 
disclosed by van der Ley et al . (VAND86) . This model 
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predicts eight loops that are exposed; insertion of an 
IPBD into one of these loops is likely to lead to 
display of the IPBD on the surface of the cell. 
Residues 158, 201, 238, and 275 are preferred locations 
for insertion of and IPBD. 

Other OSPs that could be used include coli BtuB, 
FepA, FhuA, IutA, FecA, and FhuE (GUDM8 9) which are 
receptors for nutrients usually found in low abundance. 
The genes of all these proteins have been sequenced, but 
topological models are not yet available. Gudmunsdottir 
et al . (GUDM8 9) have begun the construction of such a 
model for BtuB and FepA by showing that certain residues 
of BtuB face the peri plasm and by determining the 
functionality of various BtuB :: FepA fusions. Carmel et 
al . (CARM90) have reported work of a similar nature for 
FhuA. All Neisseria species express outer surface 
proteins for iron transport that have been identified 
and, in many cases, cloned. See also MORS87 and MORS88. 

Many gram-negative bacteria express one or more 
phospholipases . coli phospholipase A, product of the 

pldA gene, has been cloned and sequenced by de Geus et 
al . (DEGE84) . They found that the protein appears at 
the cell surface without any posttranslat ional 
processing. A ipbd gene fragment can be attached at 
either terminus or inserted at positions predicted to 
encode loops in the protein. That phospholipase A 
arrives on the outer surface without removal of a signal 
sequence does not prove that a PldA:: IPBD fusion protein 
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will also follow this route. Thus we might cause a 
PldA::IPBD or IPBD::PldA fusion to be secreted into the 
periplasm by addition of an appropriate signal sequence. 
Thus, in addition to simple binary fusion of an ipbd 
fragment to one terminus of pldA , the constructions: 

1) ss: : ipbd : : pldA 

2) ss: : pldA : : ipbd 

should be tested. Once the PldA::IPBD protein is free 
in the periplasm it does not remember how it got there 
and the structural features of PldA that cause it to 
localize on the outer surface will direct the fusion to 
the same destination . 

IV. D. Bacterial Spores as Genetic Packages: 

Bacterial spores have desirable properties as GP 
candidates. Spores are much more resistant than 

vegetative bacterial cells or phage to chemical and 
physical agents, and hence permit the use of a great 
variety of affinity selection conditions. Also, 
Bacillus spores neither actively metabolize nor alter 
the proteins on their surface . Spores have the 

disadvantage that the molecular mechanisms that trigger 
sporulation are less well worked out than is the 
formation of M13 or the export of protein to the outer 
membrane of coli . 

Bacteria of the genus Bacillus form endospores that 
are extremely resistant to damage by heat, radiation, 
desiccation, and toxic chemicals (reviewed by Losick et 
al . (LOSI86) ) . This phenomenon is attributed to 
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extensive intermodular crosslinking of the coat 
proteins- Endospores from the genus Bacillus are more 
stable than are exospores from Streetomvces . BacjOlus 
subtilis forms spores in 4 to 6 hours, but ^rentc^y^ 
species may require days or weeks to sporulate . In 
addition, genetic knowledge and manipulation is much 
m ore developed for B, subtilis than for other spore- 
forming bacteria. Thus Bacillus spores are preferred 
over Streptomyces spores. Bacteria of the genus 

Clostridium also form very durable endospores, but 
Z^^T, being strict anaerobes, are not convenient 
to culture . 

Viable spores that differ only slightly from wild- 
type are produced in B, subtilis , even if any one of four 
coat proteins is missing (DON087) . Moreover, plasmrd 
DNA is commonly included in spores, and plasmid encoded 
proteins have been observed on the surface of Bacillus 
spores (DEBR86) . For these reasons, we expect that it 
will be possible to express during spoliation a gene 
encoding a chimeric coat protein, without interfering 
materially with spore formation. 

Donovan et have identified several polypeptide 

components of L subtilis spore coat (DON087) ; the 
sequences of two complete coat proteins and amxno- 
terminal fragments of two others have been determined. 
Some, but not all, of the coat proteins are synthesized 
as precursors and are then processed by specific 
proteases before deposition in the spore coat (DONG87) . 
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The 12kd coat protein, CotD, contains 5 cysteines. CotD 
also contains an unusually high number of histidines 
(16) and prolines (7) . The llkd coat protein, CotC, 
contains only one cysteine and one methionine. CotC has 
a very unusual amino-acid sequence with 19 lysines (K) 
appearing as 9 K-K dipeptides and one isolated K. There 
are also 2 0 tyro sines (Y) of which 10 appear as 5 Y-Y 
dipeptides. Peptides rich in Y and K are known to 
become crosslinked in oxidizing environments (DEV078, 
WAIT8 3, WAIT8 5, WAIT8 6) . CotC contains 16 D and E amino 
acids that nearly equals the 19 Ks . There are no A, F, 
R, I, L, N, P, Q, S, or W amino acids in CotC. Neither 
CotC nor CotD is post-translationally cleaved, but the 
proteins CotA and CotB are. 

Since, in subtilis , some of the spore coat 

proteins are post-translationally processed by specific 
proteases, it is valuable to know the sequences of 
precursors and mature coat proteins so that we can avoid 
incorporating the recognition sequence of the specific 
protease into our construction of an OSP-IPBD fusion. 
The sequence of a mature spore coat protein contains 
information that causes the protein to be deposited in 
the spore coat; thus gene fusions that include some or 
all of a mature coat protein sequence are preferred for 
screening or selection for the display-of - IPBD 
phenotype . 

Fusions of ipbd fragments to cotC or cotD fragments 
are likely to cause IPBD to appear on the spore surface. 
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The genes cotC and cotD are preferred osp genes because 
CotC and CotD are not post- translat ionally cleaved. 
Subsequences from cotA or cotB could also be used to 
cause an IPBD to appear on the surface of subtilis 
spores, but we must take the post-translational cleavage 
of these proteins into account- DNA encoding IPBD could 
be fused to a fragment of cotA or cotB at either end of 
the coding region or at sites interior to the coding 
region. Spores could then be screened or selected for 
the display-of -IPBD phenotype . 

The promoter of a spore coat protein is most 
active: a) when spore coat protein is being synthesized 
and deposited onto the spore and b) in the specific 
place that spore coat proteins are being made. The 
sequences of several sporulation promoters are known; 
coding sequences operatively linked to such promoters 
are expressed only during sporulation. Ray et al . 
(RAYC87) have shown that the G4 promoter of B^ subtilis 
is directly controlled by RNA polymerase bound to a E . To 
date, no Bacillus sporulation promoter has been shown to 
be inducible by an exogenous chemical inducer as the lac 
promoter of coli. Nevertheless, the quantity of 

protein produced from a sporulation promoter can be 
controlled by other factors, such as the DNA sequence 
around the Shine -Dalgarno sequence or codon usage. 
Chemically inducible sporulation promoters can be 
developed if necessary. 
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IV. E. Artificial OSPs 

It is generally preferable to use as the genetic 
package a cell, spore or virus for which an outer 
surface protein which can be engineered to display a 
IPBD has already been identified . However, the present 
invention is not limited to such genetic packages. 

It is believed that the conditions for an outer 
surface transport signal in a bacterial cell or spore 
are not particularly stringent, i.e. , a random 
polypeptide of appropriate length (preferably 30-100 
amino acids) has a reasonable chance of providing such a 
signal. Thus, by constructing a chimeric gene 

comprising a segment encoding the IPBD linked to a 
segment of random or pseudorandom DNA (the potential 
OSTS) , and placing this gene under control of a suitable 
promoter, there is a possibility that the chimeric 
protein so encoded will function as an OSP- IPBD. 

This possibility is greatly enhanced by 
constructing numerous such genes, each having a 
different potential OSTS, cloning them into a suitable 
host, and selecting for transf ormants bearing the IPBD 
(or other marker) on their outer surface. Use of 
secretion-permissive mutants, such as prlA4 (LISS85) or 
prlG (STAD89) , can increase the probability of obtaining 
a working OSP- IPBD. 

When seeking to display a IPBD on the surface of a 
bacterial cell , as an alternative to choosing a natural 
OSP and an insertion site in the OSP, we can construct a 



gene (the -display probe", comprising; a, a regulable 
Iter (eg lacUVB, , b) a shine- Dalgamo sequence, 
promoter <e^L_^ d) . £uslon 

c, a periplasms transport g ^ 
o£ the iobd gene with a segment o£ ran 
Kaiser et ^ (KAIS87) ) , .) a stop codon, and 
transcriptional terminator 

When the genetic package x» a spore, 

above for attaching a IPBD to an 
approach described above 101 is 

' th , t . a ) a sporulation promoter is 

^rin^rrpe^asmic signal seance should be 

preSSnt ' OSP-IPBD fulfills a 

For phage, because the OSP 

any particular random DNA sequence coup ed * -P^ 

.in Droduc e a fusion protein that fits 
Tt T„ a futotional .a, Nevertheless, random «K 
Ilted b^n. large fragments of a coat protein gene 
and the tbd gene will produce a popu a ion that « 
likely to contain one or more members that d 13 pl 
„ BD on the outside of a viable phage. 

Rs previously stated, the purpose of the r 

„ ,„ OSTS like that embodied in known OSPs . 
is to encode an Obib, j-xjvc <_>, ot - 
• f iDb d and the random DNA could be m exther 
Tde^Tut S U«am is slightly preferred 
r^es from the population generated * 

r/s that display XPBB on the G P surface. Alternatively, 
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clonal isolates of GPs may be screened for the display- 
of-IPBD phenotype . 

The preference for ipbd upstream of the random DNA 
arises from consideration of the manner in which the 
successful GP(IPBD) will be used. The present invention 
contemplates introducing numerous mutations into the pbd 
region of the osp-pbd gene, which, depending on the 
variegation scheme, might include gratuitous stop 
codons . If pbd precedes the random DNA, then gratuitous 
stop codons in pbd lead to no OSP- PBD protein appearing 
on the cell surface. If pbd follows the random DNA, 
then gratuitous stop codons in pbd might lead to 
incomplete OSP-PBD proteins appearing on the cell 
surface. Incomplete proteins often are non- specif ically 
sticky so that GPs displaying incomplete PBDs are easily 
removed from the population. 

The random DNA may be obtained in a variety of 
ways. Degenerate synthetic DNA is one possibility. 
Alternatively, pseudorandom DNA can be generated from 
any DNA having high sequence diversity, e.g. , the genome 
of the organism, by partially digesting with an enzyme 
that cuts very often, e.g. , Sau3AI . Alternatively, one 
could shear DNA having high sequence diversity, blunt 
the sheared DNA with the large fragment of coli DNA 

polymerase I (hereinafter referred to as Klenow 
fragment) , and clone the sheared and blunted DNA into 
blunt sites of the vector (MANI82, p295, AUSU87) . 
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If random DNA and phenotypic selection or screening 
are used to obtain a GP(IPBD), then we clone random DNA 
into one of the restriction sites that was designed into 
the display probe. A plasmid carrying the display probe 
is digested with the appropriate restriction enzyme and 
the fragmented, random DNA is annealed and ligated by 
standard methods. The ligated plasmids are used to 
transform cells that are grown and selected for 
expression of the antibiotic -resistance gene . Plasmid- 
bearing GPs are then selected for the display-of -IPBD 
phenotype by the affinity selection methods described 
hereafter, using Af M (IPBD) as if it were the target. 

As an alternative to selecting GP(IPBD)s through 
binding to an affinity column, we can isolate colonies 
or plaques and screen for successful artificial OSPs 
through use of one of the methods listed below for 
verification of the display strategy. 
IV. F Designing the osp-ipbd gene insert: 
Genetic Construction and Expression Considerations 

The (i) pbd-osp gene may be : a) completely 

synthetic , b) a composite of natural and synthetic DNA, 
or c) a composite of natural DNA fragments. The 
important point is that the pbd segment be easily 
variegated so as to encode a multitudinous and diverse 
family of PBDs as previously described. A synthetic 
ipbd segment is preferred because it allows greatest 
control over placement of restriction sites. Primers 
complementary to regions abutting the osp-ipbd gene on 
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its 3 1 flank and to parts of the osp-ipbd gene that are 
not to be varied are needed for sequencing. 

The sequences of regulatory parts of the gene are 
taken from the sequences of natural regulatory elements: 
a) promoters, b) Shine-Dalgarno sequences, and c) 
transcriptional terminators. Regulatory elements could 
also be designed from knowledge of consensus sequences 
of natural regulatory regions. The sequences of these 
regulatory elements are connected to the coding regions; 
restriction sites are also inserted in or adjacent to 
the regulatory regions to allow convenient manipulation. 

The essential function of the affinity separation 
is to separate GPs that bear PBDs (derived from IPBD) 
having high affinity for the target from GPs bearing 
PBDs having low affinity for the target. If the elution 
volume of a GP depends on the number of PBDs on the GP 
surface, then a GP bearing many PBDs with low affinity, 
GP(PBD W ), might co-elute with a GP bearing fewer PBDs 
with high affinity, GP(PBD S ). Regulation of the osp-pbd 
gene preferably is such that most packages display 
sufficient PBD to effect a good separation according to 
affinity. Use of a regulatable promoter to control the 
level of expression of the osp-pbd allows fine 
adjustment of the chromatographic behavior of the 
variegated population. 

Induction of synthesis of engineered genes in 
vegetative bacterial cells has been exercised through 
the use of regulated promoters such as lacUVS , trpP , or 
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tac (MANI82) . The factors that regulate the quantity of 
protein synthesized include: a) promoter strength ( cf . 
HOOP87) , b) rate of initiation of translation ( cf . 
GOLD8 7) , c) codon usage, d) secondary structure of mRNA, 
including attenuators ( cf . LAND87) and terminators ( cf . 
YAGE87) , e) interaction of proteins with mRNA ( cf . 
MCPH8 6 , MILL87b, WINT87), f) degradation rates of mRNA 

( cf . BRAW87, KING86) , g) proteolysis ( cf . GOTT87) . 
These factors are sufficiently well understood that a 
wide variety of heterologous proteins can now be 
produced in coli , B . subtilis and other host cells in 
at least moderate quantities (SKER88, BETT88) . 
Preferably, the promoter for the osp-ipbd gene is 
subject to regulation by a small chemical inducer. For 
example, the lac promoter and the hybrid trp - lac ( tac ) 
promoter are regulatable with isopropyl thiogalactoside 

(IPTG) . Hereinafter, we use "XINDUCE" as a generic term 
for a chemical that induces expression of a gene . The 
promoter for the constructed gene need not come from a 
natural osp gene; any regulatable bacterial promoter can 
be used. 

Transcriptional regulation of gene expression is 
best understood and most effective, so we focus our 
attention on the promoter. If transcription of the osp- 
ipbd gene is controlled by the chemical XINDUCE, then 
the number of OSP-IPBDs per GP increases for increasing 
concentrations of XINDUCE until a fall -off in the number 
of viable packages is observed or until sufficient IPBD 
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is observed on the surface of harvested GP(IPBD)s. The 
attributes that affect the maximum number of OSP-IPBDs 
per GP are primarily structural in nature. There may be 
steric hindrance or other unwanted interactions between 
IPBDs if OSP-IPBD is substituted for every wild-type 
OSP. Excessive levels of OSP-IPBD may also adversely 
affect the solubility or morphogenesis of the GP . For 
cellular and viral GPs, as few as five copies of a 
protein having affinity for another immobilized molecule 
have resulted in successful affinity separations 
(FERE82a, FERE82b, and SMIT85) . 

A non- leaky promoter is preferred. Non-leakiness 
is useful: a) to show that affinity of GP ( osp-ipbd ) s 
for AfM(IPBD) is due to the osp-ipbd gene, and b) to 
allow growth of GP ( osp-ipbd ) in the absence of XINDUCE 
if the expression of osp-ipbd is disadvantageous. The 
lacUV5 promoter in conjunction with the Lacl q repressor 
is a preferred example. 

An exemplary osp-ipbd gene has the DNA sequence 
shown in Table 2 5 and there annotated to explain the 
useful restriction sites and biologically important 
features, viz . the lacUVS promoter, the lacQ operator, 
the Shine-Dalgarno sequence, the amino acid sequence, 
the stop codons, and the trp attenuator transcriptional 
terminator . 

The present invention is not limited to a single 
method of gene design. The osp-ipbd gene need not be 
synthesized in toto; parts of the gene may be obtained 
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from nature. One may use any genetic engineering method 
to produce the correct gene fusion, so long as one can 
easily and accurately direct mutations to specific sites 
in the pbd DNA subsequence. In all of the methods of 
mutagenesis considered in the present invention, 
however, it is necessary that the coding sequence for 
the osp- ipbd gene be different from any other DNA in the 
OCV. The degree and nature of difference needed is 
determined by the method of mutagenesis to be used. If 
the method of mutagenesis is to be replacement of 
subsequences coding for the PBD with vgDNA, then the 
subsequences to be mutagenized are preferably bounded by 
restriction sites that are unique with respect to the 
rest of the OCV . Use of non-unique sites involves 
partial digestion which is less efficient than complete 
digestion of a unique site and is not preferred. If 
single-stranded-oligonucleotide- directed mutagenesis is 
to be used, then the DNA sequence of the subsequence 
coding for the IPBD must be unique with respect to the 

rest of the OCV. 

The coding portions of genes to be synthesized are 
designed at the protein level and then encoded in DNA. 
The amino acid sequences are chosen to achieve various 
goals, including: a) display of a IPBD on the surface 
of a GP, b) change of charge on a IPBD, and c) 
generation of a population of PBDs from which to select 
an SBD . These issues are discuss in more detail below. 
The ambiguity in the genetic code is exploited to allow 
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optimal placement o£ restriction sites and to create 
various distributions of amino acids at variegated 

codons . . 

While the invention does not require any particular 

number or placement of restriction sites. it is 

generally preferable to engineer restriction sites into 

th e gene to facilitate subsequent manipulations. 

Preferably, the gene provides a series of fairly 

uniformly spaced unique restriction sites with no more 

than a preset maximum number of bases, for example 100, 

between sites. Preferably, the gene is designed so that 

its insertion into the OCV does not destroy the 

uniqueness of unique restriction sites of the OCV. 

Preferred recognition sites are those for restriction 

enzymes which a, generate cohesive ends, b) have 

■t-^r, or- c) have higher specific 
unambiguous recognition, or c) 

activity. . 

The ambiguity of the DMA between the restriction 
sites is resolved from the following considerations. If 
the given amino acid sequence occurs in the recipient 
organism, and if the DNA sequence of the gene in the 
organism is known, then, preferably, we maximize the 
differences between the engineered and natural genes to 
minimize the potential for recombination. In addition, 
the following codons are poorly translated in L coli 

■ -if nocsQible: cta(L), cga 

and, therefore, are avoided if possible. 

it>\ For other host species, 
(R) , egg (R) , and agg (R) - F° r ocner 

a™ -restrictions would be appropriate, 
different codon rescrictions 
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Finally, long repeats of any one base are prone to 
mutation and thus are avoided. Balancing these 

considerations, we can design a DNA sequence. 
Structural Considerations 

The design of the amino-acid sequence for the ipbd- 
osp gene to encode involves a number of structural 
considerations. The design is somewhat different for 
each type of GP. In bacteria, OSPs are not essential, 
so there is no requirement that the OSP domain of a 
fusion have any of its parental functions beyond lodging 
in the outer membrane. 
Relationship between PBD and OSP 

It is not required that the PBD and OSP domains 
have any particular spatial relationship; hence the 
process of this invention does not require use of the 
method of US Patent '692. 

It is, in fact, desirable that the OSP not 
constrain the orientation of the PBD domain; this is not 
to be confused with lack of constraint within the PBD. 
Cwirla et al . (CWIR90) , Scott and Smith (SCOT90) , and 
Devlin et al . (DEVL90) , have taught that variable 
residues in phage -displayed random peptides should be 
free of influence from the phage OSP. We teach that 
binding domains having a moderate to high degree of 
conformational constraint will exhibit higher 
specificity and that higher affinity is also possible. 
Thus, we prescribe picking codons for variegation that 
specify amino acids that will appear in a well-defined 
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framework. The nature of the side groups is varied 
through a very wide range due to the combinatorial 
replacement of multiple amino acids. The main chain 
conformations of most PBDs of a given class is very- 
similar. The movement of the PBD relative to the OSP 
should not, however, be restricted. Thus it is often 
appropriate to include a flexible linker between the PBD 
and the OSP. Such flexible linkers can be taken from 
naturally occurring proteins known to have flexible 
regions. For example, the gill protein of M13 contains 
glycine-rich regions thought to allow the amino- terminal 
domains a high degree of freedom. Such flexible linkers 
may also be designed. Segments of polypeptides that are 
rich in the amino acids GLY, ASN, SER, and ASP are 
likely to give rise to flexibility. Multiple glycines 
are particularly preferred. 
Constraints imposed by OSP 

When we choose to insert the PBD into a surface 
loop of an OSP such as LamB, OmpA, or M13 gill protein, 
there are a few considerations that do not arise when 
PBD is joined to the end of an OSP. In these cases, the 
OSP exerts some constraining influence on the PBD; the 
ends of the PBD are held in more or less fixed 
positions. We could insert a highly varied DNA sequence 
into the osp gene at codons that encode a surface- 
exposed loop and select for cells that have a specific- 
binding phenotype. When the identified amino- acid 
sequence is synthesized (by any means) , the con straint 
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of the OSP is lost and the peptide is likely to have a 
much lower affinity for the target and a much lower 
specificity. Tan and Kaiser (TANN77) found that a 
synthetic model of BPTI containing all the amino acids 
of BPTI that contact . trypsin has a Kd for trypsin ~10 7 
higher than BPTI. Thus, it is strongly preferred that 
the varied amino acids be part of a PBD in which the 
structural constrains are supplied by the PBD. 

It is known that the amino acids adjoining foreign 
epitopes inserted into LamB influence the immunological 
properties of these epitopes (VAND90) . We expect that 
PBDs inserted into loops of LamB, OmpA, or similar OSPs 
will be influenced by the amino acids of the loop and by 
the OSP in general . To obtain appropriate display of 
the PBD, it may be necessary to add one or more linker 
amino acids between the OSP and the PBD. Such linkers 
may be taken from natural proteins or designed on the 
basis of our knowledge of the structural behavior of 
amino acids. Sequences rich in GLY, SER, ASN, ASP, ARG, 
and THR are appropriate. One to five amino acids at 
either junction are likely to impart the desired degree 
of flexibility between the OSP and the PBD. 
Phage OSP 

A preferred site for insertion of the ipbd gene 
into the phage osp gene is one in which: a) the IPBD 
folds into its original shape, b) the OSP domains fold 
into their original shapes, and c) there is no 
interference between the two domains. 
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if there 1. a model of the phage that indicates 
that either the amino or carboxy terminus of an OSP « 
exposed to solvent, then the exposed termrnus of that 
mature OSP becomes the prime candidate for mser tl on 
the ipbd gene. A low resolution 3D model suffices. 

In" the absence of a 3D structure, the ammo and 
carboxy termini of the mature OSP are the best 
oandidates for insertion of the ipbd gene. X .unc-nal 
fusion may require additional residues between the 
andl OSP domains to avoid unwanted interactions between 
the domains. Random- sequence D NA or DNA 
specific sequence of a protein homologous to the 
OSP. can be inserted between the osp. fragment and the 

iobd fragment if needed. 

Pusion at a domain boundary within the OSP rs also 
a good approach for obtaining a functional fusron. 
smith exploited such a boundary when subclone 
heterologous DNA into gene 111 of fx (SMIT85) - 

The criteria for identifying OSP domams suitable 

. , „<= ,rx IPBD are somewhat different 
for causing display of an ipbd are 

£r om those used to identify and IPBD. «he„ 
an OSP, minimal size is not so important because the 
domain will not appear in the final binding molecule nor 
will „e need to synthesis the gene repeatedly 
variegation round. The m aj or design concerns are that: 
a, the OSP= ; IPBD fusion causes display of IPBD, b, the 
initial genetic construction be reasonably convenient 

aene be genetically stable and 
and c) the osp: :ipbd gene be gei 



173 



easily manipulated. There are severe! method of 
identifying domains. Methods that rely on atom = 
coordinates have been reviewed by Jan.n and Chothra 
UM.IS5). These methods use matrices o £ distances 
between . carbons (C) , dividing planes ( cf . ROSE85) , or 
huried surface (KASHS4, . Chothia and col laborators 
have correlated the behavior of many natural proterns 
with domain structure (according to their def inrtron, . 
E ashin correctly predicted the stability of a omarn 
arising residues 206-316 of thermolysm (VITA84 , 

RASH84) . , 
Many researchers have used partial proteolys.s and 

pro tein secjuence analysis to isolate and identify stable 
domains. (See. for example, V ITA 84. POTE8 3 , SCOTS,., and 
PAB07 9 . ) Pabo et al. used calorimetry as an rndrcator 
that the cl repressor from the coliphage □ contains two 
domains; they then used partia! proteolysis to determine 
the location of the domain boundary. 

!£ the only structural information available rs the 
amino acid se q uence of the candidate OSP, we can use the 
seq uence to predict turns and loops. There rs a hrgh 
probability that some of the loops and turns «11 b. 

, H^ted (cf Chou and Fasman, (CHOU74)); 

correctly predicted icr^ 

these locations are also candidates for insertion of 

jpbd gene fragment. 
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Bacterial OSPs 

In bacterial OSPs, the major considerations are: 
a) that the PBD is displayed, and b) that the chimeric 
protein not be toxic. 

From topological models of OSPs, we can determine 
whether the amino or carboxy termini of the OSP is 
exposed. If so, then these are excellent choices for 
fusion of the osp fragment to the ipbd fragment. 

The lamB gene has been sequenced and is available 
on a variety of plasmids (CLEM81, CHAR8 8) . Numerous 
fusions of fragments of lamB with a variety of other 
genes have been used to study export of proteins in E . 
coli . From various studies, Charbit et, al . (CHAR8 8) 
have proposed a model that specifies which residues of 
LamB are: a) embedded in the membrane, b) facing the 
periplasm, and c) facing the cell surface; we adopt the 
numbering of this model for amino acids in the mature 
protein. According to this model, several loops on the 
outer surface are defined, including: 1) residues 88 
through 111, 2) residues 145 through 165, and 3) 23 6 
through 2 51 . 

Consider a mini -protein embedded in LamB. For 
example, insertion of DNA encoding GiNXCX 5 XXXCXi 0 SG 12 (SEQ 
ID NO: 8) between codons 153 and 154 of lamB is likely to 
lead to a wide variety of LamB derivatives being 
expressed on the surface of E^ coli cells. Gi, N 2 , Sn, 
and G12 are supplied to allow the mini-protein sufficient 
orientational freedom that is can interact optimally 
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with the target. Using affinity enrichment (involving, 
for example, FACS via a fluorescent ly labeled target, 
perhaps through several rounds of enrichment) , we might 
obtain a strain (named, for example, BEST) that 
expresses a particular LamB derivative that shows high 
affinity for the predetermined target. An octapeptide 
having the sequence of the inserted residues 3 through 
10 from BEST is likely to have an affinity and 
specificity similar to that observed in BEST because the 
octapeptide has an internal structure that keeps the 
amino acids in a conformation that is quite similar in 
the LamB derivative and in the isolated mini -protein. 
Consideration of the Signal Peptide 

Fusing one or more new domains to a protein may 
make the ability of the new protein to be exported from 
the cell different from the ability of the parental 
protein. The signal peptide of the wild-type coat 
protein may function for authentic polypeptide but be 
unable to direct export of a fusion. To utilize the 
Sec-dependent pathway, one may need a different signal 
peptide. Thus, to express and display a chimeric 
BPTI/M13 gene VIII protein, we found it necessary to 
utilize a heterologous signal peptide (that of phoA ) . 
Provision of a means to remove PBD from the GP 

GPs that display peptides having high affinity for 
the target may be quite difficult to elute from the 
target, particularly a multivalent target. (Bacteria 
that are bound very tightly can simply multiply in 
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situ ) For phage, one can introduce a cleavage site for 
a specific protease, such as blood-clotting Factor Xa, 
into the fusion OSP protein so that the binding domaxn 
can be cleaved from the genetic package. Such cleavage 
haS the advantage that all resulting phage have 
identical OSPs and therefore are equally infective, even 
if polypeptide-displaying phage can be eluted -from the _ 
affinity matrix without cleavage. This step allows 
recovery of valuable genes which might otherwise be 
lost. To our knowledge, no one has disclosed or 

. ^ • ^-^-t- ^ as a means to 
suggested using a specific protease 

recover an information-containing genetic package or of 
converting a population of phage that vary xn 
infect ivity into phage having identical infect xvxty. 
-rw_ _g_ Synthese s of Gene Inserts 

The present invention is not limited as to how a 
designed DNA sequence is divided for easy synthesis. An 
established method is to synthesize both strands of the 
entire gene in overlapping segments of 20 to 50 
nucleotides (nts) (THER88) . An alternative method that 
is more suitable for synthesis of vgDNA is an adaptation 
of methods published by Oliphant et sL (0LIPB6 and 
OLIP87) and Ausubel et a^ (AUSU87) . It differs from 
previous methods in that it: a) uses two synthetxc 
strands, and b) does not cut the extended DNA in the 
middle. Our goals are: a) to produce longer pieces of 
dsDNA than can be synthesized as ssDNA on commercial DNA 
synthesizers, and b) to produce strands complementary to 
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single -stranded vgDNA. By using two synthetic strands, 
we remove the requirement for a palindromic sequence at 
the 3 ' end. 

DNA synthesizers can currently produce oligo-nts of 
lengths up to 200 nts in reasonable yield, M DNA = 200. 
The parameters N w (the length of overlap needed to obtain 
efficient annealing) and N s (the number of spacer bases 
needed so that a restriction enzyme can cut near the end 
of blunt -ended dsDNA) are determined by DNA and enzyme 
chemistry. N w = 10 and N s = 5 are reasonable values. 
Larger values of N w and N s are allowed but add to the 
length of ssDNA that is to be synthesized and reduce the 
net length of dsDNA that can be produced. 

Let A L be the actual length of dsDNA to be syn 
thesized, including any spacers. A L must be no greater 
than (2 M DNA - N w ) . Let Q w be the number of nts that the 
overlap window can deviate from center, 

Q w = (2 M DNA - N w - A L ) /2 . 

Q w is never negative. It is preferred that the two 
fragments be approximately the same length so that the 
amounts synthesized will be approximately equal . This 
preference may be overridden by other considerations. 
The overall yield of dsDNA is usually dominated by the 
synthetic yield of the longer oligo-nt. 

We use the following procedure to generate dsDNA of 
lengths up to (2 M DNA - N w ) nts through the use of Klenow 
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fragment to extend synthetic ss DNA fragments that are 
not more than M DN a nts long. When a pair of long oligo- 
nts, complementary for N w nts at their 3' ends, are 
annealed there will be a free 3 1 hydroxyl and a long 
ssDNA chain continuing in the 5' direction on either 
side. We will refer to this situation as a 5' 
superoverhang . The procedure comprises: 

1) picking a non-palindromic subsequence of N w to N w +4 
nts near the center of the dsDNA to be syn 
thesized; this region is called the overlap 
(typically, N w is 10) , 

2) synthesizing a ss DNA molecule that comprises that 
part of the anti-sense strand from its 5 ! end up to 
and including the overlap, 

3) synthesizing a ss DNA molecule that comprises that 
part of the sense strand from its 5 ' end up to and 
including the overlap, 

4) annealing the two synthetic strands that are 
complementary throughout the overlap region, and 

5) extending both superoverhangs with Klenow fragment 
and all four deoxynucleotide triphosphates. 

Because M DNA is not rigidly fixed at 200, the current 
limits of 390 (= 2 M DNA - N w ) nts overall and 200 in each 
fragment are not rigid, but can be exceeded by 5 or 10 
nts. Going beyond the limits of 3 90 and 200 will lead 
to lower yields, but these may be acceptable in certain 
cases . 
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Restriction enzymes do not cut well at sites closer 
than about five base pairs from the end of blunt ds DNA 
fragments (OLIP87 and p. 132 New England BioLabs 1990- 
1991 Catalogue) . Therefore N s nts (with N s typically set 
to 5) of spacer are added to ends that we intend to cut 
with a restriction enzyme. If the plasmid is to be cut 
with a blunt-cutting enzyme, then we do not add any 
spacer to the corresponding end of the ds DNA fragment. 

To choose the optimum site of overlap for the 
oligo-nt fragments, first consider the anti-sense strand 
of the DNA to be synthesized, including any spacers at 
the ends, written (in upper case) from 5' to 3' and 
left-to-right. N.B. : The N w nt long overlap window can 
never include bases that are to be variegated. N^^ 
The N W nt long overlap should not be palindromic lest 
single DNA molecules prime themselves. Place a N w nt 
long window as close to the center of the anti-sense 
sequence as possible. Check to see whether one or more 
codons within the window can be changed to increase the 
GC content without: a) destroying a needed restriction 
site, b) changing amino acid sequence, or c) making the 
overlap region palindromic. If possible, change some AT 
base pairs to GC pairs. If the GC content of the window 
is less than 50%, slide the window right or left as much 
as Q w nts to maximize the number of C's and G's inside 
the window, but without including any variegated bases. 
For each trial setting of the overlap window, maximize 
the GC content by silent codon changes, but do not 
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destroy wanted restriction sites or make the overlap 
palindromic. If the best setting still has less than 
50% GC, enlarge the window to N w +2 nts and place it 
within five nts of the center to obtain the maximum GC 
content. If enlarging the window one or two nts will 
increase the GC content, do so, but do not include 
variegated bases . 

Underscore the anti-sense strand from the 5' end up 
to the right edge of the window. Write the 

complementary sense sequence 3'-to-5' and left- to-right 
and in lower case letters, under the ant i- sense strand 
starting at the left edge of the window and continuing 
all the way to the right end of the ant i- sense strand. 

We will synthesize the underscored anti-sense 
strand and the part of the sense strand that we wrote. 
These two fragments, complementary over the length of 
the window of high GC content, are mixed in equimolar 
quantities and annealed. These fragments are extended 
with Klenow fragment and all four deoxynucleotide 
triphosphates to produce ds blunt -ended DNA. This DNA 
can be cut with appropriate restriction enzymes to 
produce the cohesive ends needed to ligate the fragment 
to other DNA. 

The present invention is not limited to any parti 
cular method of DNA synthesis or construction. Conven 
tional DNA synthesizers may be used, with appropriate 
reagent modifications for production of variegated DNA 
(similar to that now used for production of mixed 
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probes) . For example, the Milligen 7500 DNA synthesizer 
has seven vials from which phosphoramidites may be 
taken. Normally, the first four contain A, C, T, and G. 
The other three vials may contain unusual bases such as 
inosine or mixtures of bases, the so-called "dirty 
bottle" . The standard software allows programmed mixing 
of two, three, or four bases in equimolar quantities. 

The synthesized DNA may be purified by any art 
recognized technique, e.g. , by high-pressure liquid 
chromatography (HPLC) or PAGE. 

The osp-pbd gene s may be created by inserting vgDNA 
into an existing parental gene, such as the osp- ipbd 
shown to be displayable by a suitably transformed GP . 
The present invention is not limited to any particular 
method of introducing the vgDNA, however, two techniques 
are discussed below. 

In the case of cassette mutagenesis, the 
restriction sites that were introduced when the gene for 
the inserted domain was synthesized are used to 
introduce the synthetic vgDNA into a plasmid or other 
OCV. Restriction digestions and ligations are performed 
by standard methods (AUSU87) . 

In the case of single-stranded-oligonucleotide- 
directed mutagenesis, synthetic vgDNA is used to create 
diversity in the vector (BOTS85) . 

The modes of creating diversity in the population 
of GPs discussed herein are not the only modes possible. 
Any method of mutagenesis that preserves at least a 
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large fraction of the information obtained from one 
selection and then introduces other mutations in the 
same domain will work. The limiting factors are the 
number of independent . trans formants that can be produced 
and the amount of enrichment one can achieve through 
affinity separation. Therefore the preferred embodiment 
uses a method of mutagenesis that focuses mutations into 
those residues that are most likely to affect the 
binding properties of the PBD and are least likely to 
destroy the underlying structure of the I PBD. 

Other modes of mutagenesis might allow other GPs to 
be considered. For example, the bacteriophage □ is not 
a useful cloning vehicle for cassette mutagenesis 
because of the plethora of restriction sites. One can, 
however, use single-stranded-oligo-nt-directed 

mutagenesis on X without the need for unique restric 
tion sites. No one has used single- stranded-oligo-nt- 
directed mutagenesis to introduce the high level of 
diversity called for in the present invention, but if it 
is possible, such a method would allow use of phage with 
large genomes. 

I V. H. Operative Cloning Vector 

The operative cloning vector (OCV) is a replicable 
nucleic acid used to introduce the chimeric ipbd-osp_ or 
ipbd-osp. gene into the genetic package. When the 
genetic package is a virus, it may serve as its own OCV. 
For cells and spores, the OCV may be a plasmid, a virus, 
a phagemid, or a chromosome. 
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The OCV is preferably small (less than 10 KB) , 
stable (even after insertion of at least 1 kb DNA) , 
present in multiple copies within the host cell, and 
selectable with appropriate media. It is desirable that 
cassette mutagenesis be practical in the OCV; 
preferably, at least 25 restriction enzymes are 
available that do not cut the OCV. It is likewise 
desirable that single- stranded mutagenesis be practical. 
If a suitable OCV does not already exist, it may be 
engineered by manipulation of available vectors. 

When the GP is a bacterial cell or spore, the OCV 
is preferably a plasmid because genes on plasmids are 
much more easily constructed and mutated than are genes 
in the bacterial chromosome. When bacteriophage are to 
be used, the osp-ipbd gene is inserted into the phage 
genome. The synthetic osp-ipbd genes can be constructed 
in small vectors and transferred to the GP genome when 
complete . 

Phage such as M13 do not confer antibiotic 
resistance on the host so that one can not select for 
cells infected with M13 . An antibiotic resistance gene 
can be engineered into the M13 genome (HINE8 0) . More 
virulent phage, such as 3>X174, make discernable plaques 
that can be picked, in which case a resistance gene is 
not essential; furthermore, there is no room in the 
3>X174 virion to add any new genetic material. Inability 
to include an antibiotic resistance gene is a 



184 



disadvantage because it limits the number of GPs that 
can be screened. 

It is preferred that GP(IPBD) carry a selectable 
marker not carried by wtGP. It is also preferred that 
wtGP carry a selectable marker not carried by GP(IPBD) . 

A derivative of M13 is the most preferred OCV when 
the phage also serves as the GP. Wild- type M13 does not 
confer any resistances on infected cells; M13 is a pure 
parasite. A "phagemid" is a hybrid between a phage and 
a plasmid, and is used in this invention. Double- 
stranded plasmid DNA isolated from phagemid- bearing 
cells is denoted by the standard convention, e.g. pXY24 . 
Phage prepared from these cells would be designated 
XY24. Phagemids such as Bluescript K/S (sold by 

Stratagene) are not preferred for our purposes because 
Bluescript does not contain the full genome of M13 and 
must be rescued by coinfection with competent wild-type 
M13 . Such coinfections could lead to genetic 

recombination yielding heterogeneous phage unsuitable 
for the purposes of the present invention. Phagemids 
may be entirely suitable for developing a gene that 
causes an IPBD to appear on the surface of phage -like 
genetic packages . 

It is also well known that plasmids containing the 
ColEl origin of replication can be greatly amplified if 
protein synthesis is halted in a log-phase culture. 
Protein synthesis can be halted by addition of chloram 
phenicol or other agents (MANI82) . 
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The bacteriophage M13 bla 61 (ATCC 37039) is 
derived from wild-type M13 through the insertion of the 
fi lactamase gene (HINE80) . This phage contains 8.13 kb 
of DNA . M13 bla cat 1 (ATCC 37040) is derived from M13 
bla 61 through the additional insertion of the 
chloramphenicol resistance gene (HINE80) ; M13 bla cat 1 
contains 9.88 kb of DNA. Although neither of these 
variants of M13 contains the ColEl origin of 
replication, either could be used as a starting point to 
construct a cloning vector with this feature. 
IV. I . Transformation of cells: 

When the GP is a cell, the population of GPs is 
created by transforming the cells with suitable OCVs . 
When the GP is a phage, the phage are genetically 
engineered and then transfected into host cells suitable 
for amplification. When the GP is a spore, cells 
capable of sporulation are transformed with the OCV 
while in a normal metabolic state, and then sporulation 
is induced so as to cause the OSP-PBDs to be displayed. 
The present invention is not limited to any one method 
of transforming cells with DNA. The procedure given in 
the examples is a modification of that of Maniatis 
(p250, MANI82) . One preferably obtains at least 10 7 and 
more preferably at least 10 8 transf ormants//ig of CCC DNA. 

The transformed cells are grown first under non- 
selective conditions that allow expression of plasmid 
genes and then selected to kill untransf ormed cells. 
Transformed cells are then induced to express the osp- 
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pbd gene at the appropriate level of induction. The GPs 
carrying the I PBD or PBDs are then harvested by methods 
appropriate to the GP at hand, generally, centrif ugation 
to pelletize GPs and resuspension of the pellets in 
sterile medium (cells) or buffer (spores or phage) . 
They are then ready for verification that the display 
strategy was successful (where the GPs all display a 
"test" IPBD) or for affinity selection (where the GPs 
display a variety of different PBDs) . 

IV, J. Verification of Display Strategy: 

The harvested packages are tested to determine 
whether the IPBD is present on the surface. In any 
tests of GPs for the presence of IPBD on the GP surface, 
any ions or cofactors known to be essential for the 
stability of IPBD or Af M (IPBD) are included at 
appropriate levels. The tests can be done: a) by 
affinity labeling, b) enzymatically, c) 

spectrophotometrically , d) by affinity separation, or e) 
by affinity precipitation. The Af M ( IPBD) in this step 
is one picked to have strong affinity (preferably, 
Ka < 10" 11 M) for the IPBD molecule and little or no 
affinity for the wtGP. For example, if BPTI were the 
IPBD, trypsin, anhydrotrypsin, or antibodies to BPTI 
could be used as the AfM(BPTI) to test for the presence 
of BPTI. Anhydrotrypsin, a trypsin derivative with 
serine 195 converted to dehydroalanine , has no 
proteolytic activity but retains its affinity for BPTI 
(AKOH72 and HUBE7 7) . 
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Preferably, the presence of the IPBD on the surface 
of the GP is demonstrated through the use of a soluble, 
labeled derivative of a Af M (IPBD) with high affinity for 
IPBD. The label could be: a) a radioactive atom such 
as 125 I, b) a chemical entity such as biotin, or 3) a 
fluorescent entity such as rhodamine or fluorescein. 
The labeled derivative of Af M (IPBD) is denoted as 
Af M (IPBD) * . The preferred procedure is: 

1) mix Af M (IPBD) * with GPs that are to be tested for 
the presence of IPBD; conditions of mixing should 
favor binding of IPBD to Af M (IPBD) * , 

2) separate GPs from unbound Af M (IPBD) * by use of: 

a) a molecular sizing filter that will pass 
Af M (IPBD) * but not GPs, 

b) centrif ugation, or 

c) a molecular sizing column (such as Sepharose or 
Sephadex) that retains free Af M (IPBD) * but not 
GPs, 

3) quantitate the Af M (IPBD) * bound by GPs. 
Alternatively, if the IPBD has a known biochemical 
activity (enzymatic or inhibitory) , its presence on the 
GP can be verified through this activity. For example, 
if the IPBD were BPTI, then one could use the stoichio 
metric inactivation of trypsin not only to demonstrate 
the presence of BPTI, but also to quantitate the amount. 

If the IPBD has strong, characteristic absorption 
bands in the visible or UV that are distinct from 
absorption by the wtGP, then another alternative for 
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measuring the IPBD displayed on the GP is a 
spectrophotometric measurement. For example, if IPBD 
were azurin, the visible absorption could be used to 
identify GPs that display azurin. 

Another alternative is to label the GPs and measure 
the amount of label retained by immobilized Af M (IPBD) . 
For example, the GPs could be grown with a radioactive 
precursor, such as 32 P or 3 H- thymidine , and the 
radioactivity retained by immobilized Af M (IPBD) 
measured . 

Another alternative is to use affinity chromato- 
graphy; the ability of a GP bearing the IPBD to bind a 
matrix that supports a Af M (IPBD) is measured by 
reference to the wtGP. 

Another alternative for detecting the presence of 
IPBD on the GP surface is affinity precipitation. 

If random DNA has been used, then affinity 
selection procedures are used to obtain a clonal isolate 
that has the display-of -IPBD phenotype . Alternatively, 
clonal isolates may be screened for the display-of - IPBD 
phenotype. The tests of this step are applied to one or 
more of these clonal isolates. 

If no isolates that bind to the affinity molecule 
are obtained we take corrective action as disclosed 
below. 

If one or more of the tests above indicates that 
the IPBD is displayed on the GP surface, we verify that 
the binding of molecules having known affinity for IPBD 
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is due to the chimeric osp-ipbd gene through the use of 
standard genetic and biochemical techniques, such as: 

1) transferring the osp-ipbd gene into the parent GP 
to verify that osp-ipbd confers binding, 

2) deleting the osp-ipbd gene from the isolated GP to 
verify that loss of osp-ipbd causes loss of 
binding, 

3) showing that binding of GPs to AfM(IPBD) correlates 
with [XINDUCE] (in those cases that expression of 
osp-ipbd is controlled by [XINDUCE] ) , and 

4) showing that binding of GPs to AfM(IPBD) is 
specific to the immobilized AfM(IPBD) and not to 
the support matrix. 

Variation of : a) binding of GPs by soluble 
AfM(IPBD)*, b) absorption caused by IPBD, and c) 
biochemical reactions of IPBD are linear in the amount 
of IPBD displayed. Presence of IPBD on the GP surface 
is indicated by a strong correlation between [XINDUCE] 
and the reactions that are linear in the amount of IPBD. 
Leakiness of the promoter is not likely to present 
problems of high background with assays that are linear 
in the amount of IPBD. These experiments may be quicker 
and easier than the genetic tests. Interpreting the 
effect of [XINDUCE] on binding to a {Af M (IPBD) } column, 
however, may be problematic unless the regulated 
promoter is completely repressed in the absence of 
[XINDUCE]. The affinity retention of GP(IPBD)s is not 
linear in the number of IPBDs/GP and there may be, for 
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example, little phenotypic difference between GPs 
bearing 5 IPBDs and GPs bearing 5 0 IPBDs. The 
demonstration that binding is to AfM(IPBD) and the 
genetic tests are essential; the tests with XINDUCE are 
optional . 

We sequence the relevant ipbd gene fragment from 
each of several clonal isolates to determine the 
construction. We also establish the maximum salt 

concentration and pH range for which the GP(IPBD) binds 
the chosen Af M (IPBD) . This is preferably done by 

measuring, as a function of salt concentration and pH, 
the retention of Af M (IPBD) * on molecular sizing filters 
that pass Af M (IPBD) * but not GP . This information will 
be used in refining the affinity selection scheme. 
IV. K. Analysis and Correction of Display Problems 

If the IPBD is displayed on the outside of the GP, 
and if that display is clearly caused by the introduced 
osp-ipbd gene, we proceed with variegation, otherwise we 
analyze the result and adopt appropriate corrective 
measures. If we have unsuccessfully attempted to fuse 
an ipbd fragment to a natural osp fragment, our options 
are :1) pick a different fusion to the same osp by a) 
using opposite end of osp , b) keeping more or fewer 
residues from osp in the fusion; for example, in 
increments of 3 or 4 residues, c) trying a known or 
predicted domain boundary, d) trying a predicted loop or 
turn position, 2) pick a different osp , or 3) switch to 
random DNA method. If we have just tried the random DNA 
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method unsuccessfully, our options are: 1) choose a 
different relationship between ipbd fragment and random 
DNA ( ipbd first, random DNA second or vice versa ) , 2) 
try a different degree of partial digestion, a different 
enzyme for partial digestion, a different degree of 
shearing or a different source of natural DNA, or 3) 
switch to the natural OSP method. If all reasonable 
OSPs of the current GP have been tried and the random 
DNA method has been tried, both without success, we pick 
a new GP. 

We may illustrate the ways in which problems may be 
attacked by using the example of BPTI as the IPBD, the 
M13 phage as the GP, and the major coat (gene VIII) 
protein as the OSP. The following amino-acid sequence, 



called AA seq2y illustrates how the sequence for mature 



BPTI (sftown underscored) may be inserted immediately 
after the signal sequence of M13 precoat protein 
(indicated by the arrow) and before the sequence for the 
M13 CP. 




AA_seq2 




566778899 10 
5050505050 
GLCQTFVYGGCRAKRNNFKSAEDCMRTCGGAAEGDDPAKAAFNSLQASAT 



10 11 11 12 12 13 
5 0 5 0 5 0 
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E Y I G Y AWAMVW I VGAT I G I KL F KKF T S KA S 

We adopt the convention that sequence numbers of 
fusion proteins refer to the fusion, as coded, unless 
otherwise noted. Thus the alanine that begins M13 CP is 
referred to as "number 82" , "number 1 of M13 CP", or 
"number 59 of the mature BPTI-M13 CP fusion". 

It is desirable to determine where, exactly, the 
BPTI binding domain is being transported: is it 
remaining in the cytoplasm? Is it free within the 
periplasm? Is it attached to the inner membrane? 
Proteins in the periplasm can be freed through 
spheroplast formation using lysozyme and EDTA in a 
concentrated sucrose solution (BIRD67, MALA64) . If BPTI 
were free in the periplasm, it would be found in the 
supernatajit . Trypsin labeled with 125 I would be mixed 
with supernatant and passed over a non-denaturing 
molecular sizing column and the radioactive fractions 
collected. The radioactive fractions would then be 
analyzed by SDS-PAGE and examined for BPTI -sized bands 
by silver staining. 

Spheroplast formation exposes proteins anchored in 
the inner membrane. Spheroplasts would be mixed with 
AHTrp* and then either filtered or centrifuged to 
separate them from unbound AHTrp* . After washing with 
hypertonic buffer, the spheroplasts would be analyzed 
for extent of AHTrp* binding . 
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If BPTI were found free in the periplasm, then we 
would expect that the chimeric protein was being cleaved 
both between BPTI and the Ml 3 mature coat sequence and 
between BPTI and the signal sequence. In that case, we 
should alter the BPTI/M13 CP junction by inserting vgDNA 
at codons for residues 78-82 of AA_seq2 . 

If BPTI were found attached to the inner membrane, 
then two hypotheses can be formed. The first is that 
the chimeric protein is being cut after the signal 
sequence, but is not being incorporated into LG7 virion; 
the treatment would also be to insert vgDNA between 
residues 78 and 82 of AA_seq2 . The alternative 

hypothesis is that BPTI could fold and react with 
trypsin even if signal sequence is not cleaved. N- 
terminal amino acid sequencing of trypsin-binding 
material isolated from cell homogenate determines what 
processing is occurring. If signal sequence were being 
cleaved, we would use the procedure above to vary 
residues between C78 and A82 ; subsequent passes would 
add residues after residue 81. If signal sequence were 
not being cleaved, we would vary residues between 23 and 
27 of AA_seq2 . Subsequent passes through that process 
would add residues after 23. 

If BPTI were found neither in the periplasm nor on 
the inner membrane, then we would expect that the fault 
was in the signal sequence or the signal -sequence- to- 
BPTI junction. The treatment in this case would be to 
vary residues between 23 and 27. 
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Analytical experiments to determine what has gone 
wrong taKe time and effort and, for the foreseen out 
comes, indicate variations in only two regions. There 
(ore we believe it prudent to try the synthetic 
experiments described below without doing the analyse. 
Por example, these six experiments that introduce 
variegation into the boti^ene VIII fusion could be 

!, 3 variegated codons between residues 78 and 82 
using olig#12 and olig#13, 

2) 3 variegated codons between residues 23 and 27 
using olig#14 and olig#15, 

3) 5 variegated codons between residues 78 and 82 
using olig#13 and olig#12a, 

4) 5 variegated codons between residues 23 and 27 
using olig#15 and olig#l4a, 

5) 7 variegated codons between residues 78 and 82 
using olig#13 and olig#12b, and 

6) 7 variegated codons between residues 23 and 27 
using olig#15 and olig#14b. 

To alter the BPTI-M13 CP junction, we introduce DNA 
variegated at codons for residues between 78 and 82 into 
the SphI and Sfiil sites of P LG7 . The residues after the 

4- •„» are highly variable in amino acid 
last cysteine are mgiu-y 

sequences homologous to BPTI, both in composition and 
length; in Table 25 these residues are denoted as G79 
G8 0, and A81. The first part of the M13 CP is denoted 
as A82, E83, and G84 . One of the oligo-nts olig#12. 
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olig#12a, or olig#12b and the primer olig#13 are 
synthesized by standard methods. The oligo-nts are: 

residue 75 76 77 78 79 80 81 82 83 
5 ' gc | gag | cGC | ATG | CGT | ACC | TGC | qf k | qf k | qf k | GCT | GAA | - 

84 85 86 87 88 89 90 91 
GGT|GAT|GAT|CCG|GCC|AAA|GCG|GCC|gcg|cc 3* olig#12 

LS€<sj \ t> tog \ U 
residue 75 76 77 78 79 80 81 81a 81b 
5 * gc | gag | cGC | ATG | CGT | ACC | TGC | qf k | qf k | qf k | qf k | qf k | - 



residue 75 76 77 78 79 80 81 81a 81b 
5 ' gc | gag | cGC | ATG | CGT | ACC | TGC | qf k | qf k | qf k | qf k | qf k | - 



where q is a mixture of (0.26 T, 0.18C, 0.26 A, and 0.30 
G) , f is a mixture of (0.22 T, 0.16 C, 0.40 A, and 0.22 
G) , and k is a mixture of equal parts of T and G. The 
bases shown in lower case at either end are spacers and 
are not incorporated into the cloned gene. The primer 
is complementary to the 3 ' end of each of the longer 
oligo-nts. One of the variegated oligo-nts and the 



82 83 84 85 86 87 
GCT | GAA | GGT | GAT | GAT | CCG 



88 89 90 91 
GCC | AAA | GCG | GCC | gcg | cc 3' olig#12a 



81c 81d 82 83 84 85 86 87 

qf k | qf k | GCT | GAA | GGT | GAT | GAT | CCG | - 



residue 91 90 89 88 87 86 
5 1 gg | cgc | GGC | CGC | TTT | GGC | CGG | ATC 3 
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primer olig#13 are combined in equimolar amounts and 
annealed. The dsDNA is completed with all four (nt)TPs 
and Klenow fragment. The resulting dsDNA and RF pLG7 
are cut with both Sfil and Sph I , purified, mixed, and 
ligated. We then select a transformed clone that, when 
induced with IPTG, binds AHTrp. 

To vary the junction between M13 signal sequence 
and BPTI, we introduce DNA variegated at codons for 
residues between 23 and 27 into the Kpn l and Xho l sites 
of pLG7 . The first three residues are highly variable in 
amino acid sequences homologous to BPTI . Homologous 
sequences also vary in length at the amino terminus. 
One of the oligo-nts olig#14 / olig#14a, or olig#14b and 
the primer olig#15 are synthesized by standard methods. 
The ol igo - nt s are : 

residue : 17 18 19 20 21 22 23 24 25 

5 1 g . gcc . gcG . GTA . CCG . ATG . CTG . TCT . TTT . GCT . qf k . qf k . - 

26 27 28 29 30 
|qfk|TTC|TGT|CTC|GAG|cgc|ccg|cga| 3' olig#14 

residue 17 18 19 20 21 22 23 24 25 26 

5 ' gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | qf k | qf k | qf k | - 

26a 26b 27 28 29 30 

| qf k | qf k | TTC | TGT | CTC | GAG | cgc | ccg | cga | 3' ol ig#14a , N 

residue 17 18 19 20 21 22 23 24 25 26 
5 1 g | gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | qf k | qf k | qf k | - 



26a 26b 26c 26d 27 28 29 30 

| qf k | qf k | qf k | qf k | TTC | TGT | CTC | GAG j cgc | ccg | cga | 3 ' ol ig#14b 
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5 ■ | teg | egg | gcg | CTC | GAG | ACA | GAA | 3 ' ol ig#l 5 

where q is a mixture of (0.26 T, 0.18 ~, 0 . 2TTT; 'and 
0.30 G) , f is a mixture of (0.22 T, 0.16 C, 0.40 A, and 
0.22 G) , and k is a mixture of equal parts of T and G. 
The bases shown in lower case at either end are spacers 
and are not incorporated into the cloned gene. One of 
the variegated oligo-nts and the primer are combined in 
equimolar amounts and annealed. The ds DNA is completed 
with all four (nt)TPs and Klenow fragment. The 
resulting dsDNA and RF pLG7 are cut with both Kpn l and 
Xhol , purified, mixed, and ligated. We select a 

transformed clone that, when induced with IPTG, binds 
AHTrp or trp. 

Other numbers of variegated codons could be used. 

If none of these approaches produces a working 
chimeric protein, we may try a different signal 
sequence. If that doesn't work, we may try a different 
OSP. 

V. AFFINITY SELECTION OF TARGET -BINDING MUTANTS 

V.A. Affinity Separation Technology, Generally 

Affinity separation is used initially in the 
present invention to verify that the display system is 
working, i.e. , that a chimeric outer surface protein has 
been expressed and transported to the surface of the 
genetic package and is oriented so that the inserted 
binding domain is accessible to target material. When 
used for this purpose, the binding domain is a known 
binding domain for a particular target and that target 
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is the affinity molecule used in the affinity separation 
process. For example, a display system may be validated 
by using inserting DNA encoding BPTI into a gene 
encoding an outer surface protein of the genetic package 
of interest, and testing for binding to anhydrotrypsin, 
which is normally bound by BPTI. 

If the genetic packages bind to the target, then we 
have confirmation that the corresponding binding domain 
is indeed displayed by the genetic package. Packages 
which display the binding domain (and thereby bind the 
target) are separated from those which do not. 

Once the display system is validated, it is 
possible to use a variegated population of genetic 
packages which display a variety of different potential 
binding domains, and use affinity separation technology 
to determine how well they bind to one or more targets. 
This target need not be one bound by a known binding 
domain which is parental to the displayed binding 
domains, i.e. , one may select for binding to a new 
target . 

For example, one may variegate a BPTI binding 
domain and test for binding, not to trypsin, but to 
another serine protease, such as human neutrophil 
elastase or cathepsin G, or even to a wholly unrelated 
target, such as horse heart myoglobin. 

The term "affinity separation means" includes, but 
is not limited to: a) affinity column chromatography, b) 
batch elution from an affinity matrix material, c) batch 
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elution from an affinity material attached to a plate, 
d) fluorescence activated cell sorting, and e) 
electrophoresis in the presence of target material . 
"Affinity material* 1 is used to mean a material with 
affinity for the material to be purified, called the 
"analyte" . In most cases, the association of the 

affinity material and the analyte is reversible so that 
the analyte can be freed from the affinity material once 
the impurities are washed away. 

The procedures described in sections V.H, V.I and 
V.J are not required for practicing the present 
invention, but may facilitate the development of novel 
binding proteins thereby. 

V.B. Affinity Chromatography, Generally 

Affinity column chromatography, batch elution from 
an affinity matrix material held in some container, and 
batch elution from a plate are very similar and 
hereinafter will be treated under "affinity 
chromatography. " 

If affinity chromatography is to be used, then: 

1) the molecules of the target material must be of 
sufficient size and chemical reactivity to be 
applied to a solid support suitable for affinity 
separation, 

2) after application to a matrix, the target material 
preferably does not react with water, 
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3, after application to a matrix, the target material 
preferably does not bind or degrade proteins in a 

non-specific way, and 

.e tarqet material must be 

4) the molecules of tne taxyeu 

sufficiently large that attaching the, material to a 
matrix allows enough unaltered surface area 
(generally at least 500 *> , excluding the atom that 
is connected to the linker) for protein binding. 
Affinity chromatography is the preferred separate 

means, but FACS, electrophoresis, or other means may 

also be used. 

vc BOuorescent^Acti^^ 

--^^^^^eTcell sorting involves use of 

■ t -i d fluorescent per se or is 

an affinity material that is nuor 

v mn i P rule Current 
labeled with a fluorescent molecule. 

commercially available cell sorters reguire SCO to 1 0 

*- cinch as Texas red, bound 

molecules of fluorescent dye, sucn as 

to each cell. FACS can sort 10' cells or viruses/sec. 

FACS (e_^ FACStar from Beckton-Dickinson, Mountain 
View, CA) is most appropriate for bacterial cells and 

the sensitivity of the machines requires 
scores because the sensinvx x 

approximately !000 molecules of fluorescent label bound 
to each GP to accomplish a separation. OSPs such as 

a h =>io 4 /cell, often as much 
OmpA, OmpF, ompC are present at ,10 /cell, 

as loVcell. Thus use of FACS with PBDs displayed on one 
o£ the OSPs of a bacterial cell is attractive. This is 
particularly true if the target is q uite small so that 
attachment to a matrix has a much greater effect than 
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would attachment to a dye. To optimize FACS separation 
of GPs, we use a derivative of Afm(IPBD) that is labeled 
with a fluorescent molecule, denoted Afm(IPBD)*. The 
variables to be optimized include: a) amount of IPBD/GP, 
b) concentration of Afm(IPBD)*, c) ionic strength, d) 
concentration of GPs, and e) parameters pertaining to 
operation of the FACS machine. Because Afm(IPBD)* and 
GPs interact in solution, the binding will be linear in 
both [Afm(IPBD) *] and [displayed IPBD] . Preferably, 
these two parameters are varied together. The other 
parameters can be optimized independently. 

If FACS is to be used as the affinity separation 
means , then : 

1) the molecules of the target material must be of 
sufficient size and chemical reactivity to be 
conjugated to a suitable fluorescent dye or the 
target must itself. be fluorescent, 

2) after any necessary fluorescent labeling, the 
target preferably does not react with water, 

3) after any necessary fluorescent labeling, the 
target material preferably does not bind or degrade 
proteins in a non-specific way, and 

4) the molecules of the target material must be 
sufficiently large that attaching the material to a 
suitable dye allows enough unaltered surface area 
(generally at least 500 A 2 , excluding the atom that 
is connected to the linker) for protein binding. 
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V.D. Affinity Electrophoresis, Generally 

Electrophoretic affinity separation involves 
electrophoresis of viruses or cells in the presence of 
target material, wherein the binding of said target 
material changes the net charge of the virus particles 
or cells. It has been used to separate bacteriophages 
on the basis of charge. (SERW87) . 

Electrophoresis is most appropriate to 

bacteriophage because of their small size (SERW87) 
Electrophoresis is a preferred separation means if the 
target is so small that chemically attaching it to a 
column or to a fluorescent label would essentially 
change the entire target. For example, chloroacetate 
ions contain only seven atoms and would be essentially 
altered by any linkage. CPs that bind chloroacetate 
would become more negatively charged than GPs that do 
not bind the ion and so these classes of GPs could be 
separated . 

If affinity electrophoresis is to be used, then: 

1) the target must either be charged or of such a 
nature that its binding to a protein will change 
the charge of the protein, 

2) the target material preferably does not react with 

water, 

3) the target material preferably does not bind or 
degrade proteins in a non-specific way, and 

4) the target must be compatible with a suitable gel 
material . 
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The present invention makes use of affinity- 
separation of bacterial cells, or bacterial viruses (or 
other genetic packages) to enrich a population for those 
cells or viruses carrying genes that code for proteins 
with desirable binding properties. 
V.E. Target Materials 

The present invention may be used to select for 
binding domains which bind to one or more target mater 
ials, and/or fail to bind to one or more target 
materials. Specificity, of course, is the ability of a 
binding molecule to bind strongly to a limited set of 
target materials, while binding more weakly or not at 
all to another set of target materials from which the 
first set must be distinguished. 

The target materials may be organic macromolecules , 
such as polypeptides, lipids, polynucleic acids, and 
polysaccharides, but are not so limited. Almost any 
molecule that is stable in aqueous solvent may be used 
as a target. The following list of possible targets is 
given as illustration and not as limitation. The 
categories are not strictly mutually exclusive. The 
omission of any category is not to be construed to imply 
that said category is unsuitable as a target. Merck 
Index refers to the Eleventh Edition. 
A. Peptides 

1) human 6 endorphin (Merck Index 3 528) 

2) dynorphin (MI 3458) 



2 04 



3) Substance P (MI 8834) 

4) Porcine somatostatin (MI 8671) 

5) human atrial natriuretic factor (MI 887 

6) human calcitonin 

7) glucagon 
Proteins 

I, Soluble Proteins 

a. Hormones 

1) human TNF (MI 9411) 

2) Interleukin-1 (MI 4895) 

3) Interf eron-y (MI 4894) 

4) Thyrotropin (MI 9709) 

5) Interf eron-o? (MI 4892) 

6) Insulin (MI 4887, p. 789) 

b. Enzymes 

1) human neutrophil elastase 

2 ) Human thrombin 

3) human Cathepsin G 

4) human tryptase 

5) human chymase 

6) human blood clotting Factor Xa 

7) any retro-viral Pol protease 

8) any retro-viral Gag protease 

9) dihydrof olate reductase 

10) Pseudomonas putida cytochrome P4 50 C am 

11) human pyruvate kinase 

12) coli pyruvate kinase 

13) jack bean urease 
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14) aspartate transcarbamylase (E_;_ coli ) 
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15) ras protein 

16) any protein- tyrosine kinase 

c. Inhibitors 

1) aprotinin (MI 784) 

2) human otl -anti -trypsin 

3) phage □ cl (inhibits DNA transcription) 

d. Receptors 

1) TNF receptor 

2) IgE receptor 

3) LamB 

4) CD4 

5) IL-1 receptor 

e . Toxins 

1) ricin (also an enzyme) 

2) of Conotoxin GI 

3) mellitin 

4) Bordetella pertussis adenylate cyclase (also 
an enzyme) 

5) Pseudomonas aeruginosa hemolysin 
f . Other proteins 

1) horse heart myoglobin 

2) human sickle-cell haemoglobin 

3) human deoxy haemoglobin 

4) human CO haemoglobin 

5) human low-density lipoprotein (a 
lipoprotein) 

6) human IgG (combining site removed or 
blocked) (a glycoprotein) 
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7) influenza haemagglutinin 

8) phage □ caps id 

9) fibrinogen 

10) HIV-1 gpl20 

11) Neisseria gonorrhoeae pilin 

12) fibril or flagellar protein from spirochaete 
bacterial species such as those that cause 
syphilis, Lyme disease, or relapsing fever 

13) pro-enzymes such as prothrombin and 
t ryp s i nogen 



1) silk 

2) human elastin 

3) keratin 

4) collagen 

5) fibrin 



II . 



Insoluble Proteins 



C. Nucleic acids 



a. DNA 



1) ds DNA 



5 ' -ACTAGTCTC-3 ' 
3 ' - TGATCAGAG - 5 ' 



2) ds DNA 



5 1 -CCGTCGAATCCGC-3 ' (SEQ ID NO: 90) 
3 1 - GGC AGTTTAGGCG - 5 1 ( SEQ ID NO : 9 1 ) 
(Note mismatch) 



3) ss DNA 



5 1 -CGTAACCTCGTCATTA-3 ' 

(No hair pin) (SEQ ID NO: 92) 



4) ss DNA 



5 1 -CCGTAGGT-j 
3 » -GGCATCCA J 

(Note hair pin) (SEQ ID NO: 93) 
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5) dsDNA with cohesive ends : 

5 ' - CACGGCTATTACGGT - 3 ' (SEQ ID NO: 94) 
3 f - CCGATAATGCCA- 5 ' (SEQ ID NO: 95) 

b. RNA 

1) yeast Phe tRNA 

2) ribosomal RNA 

3) segment of mRNA 

D. Organic molecules (not peptide, protein, or nucleic 
acid) 

I. Small and monomeric 

1) cholesterol 

2) aspartame 

3) bilirubin 

4) morphine 

5) codeine 

6) heroine 

7) dichlorodiphenyltrichlorethane (DDT) 

8) prostaglandin PGE2 

9) actinomycin 

10) 2,2,3 trimethyldecane 

11 ) Buckminsterf ullerene 

12) cortavazol (MI 2536, p. 397) 

II . Polymers 

1) cellulose 

2) chitin 
Hi. Others 

1) O-antigen of Salmonella enteritidis (a 
1 ipopoly saccharide ) 
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E. Inorganic compounds 

1) asbestos 

2) zeolites 

3) hydroxylapatite 

4) 111 face of crystalline silicon 

5) paulingite 

6) U(IV) (uranium ions) 

7) Au(III) (gold ions) 
F. Organometallic compounds 

1) iron (III) haem 

2) cobalt haem 

3 ) cobalamine 

4) ( isopropyl amino) 6 Cr (III) 

Serine proteases are an especially interesting 
class of potential target materials. Serine proteases 
are ubiquitous in living organisms and play vital roles 
in processes such as: digestion, blood clotting, 
fibrinolysis, immune response, fertilization, and 
post- translational processing of peptide hormones. 
Although the role these enzymes play is vital, 
uncontrolled or inappropriate proteolytic activity can 
be very damaging. Several serine proteases are directly 
involved in serious disease states. Uncontrolled 
neutrophil elastase (NE) (also known as leukocyte 
elastase) is thought to be the major cause of emphysema 
(BEIT86, HUBB86 , HUBB89, HUTC87, SOMM90, WEWE87) whether 
caused by congenital lack of a-1- antitrypsin or by 
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smoking. NE is also implicated as an essential 

ingredient in the pernicious cycle of: 

I — > (excess secretion of proteases by neut rcphi 1 s )-r 

l — < 1 

( inf lamration) 
(recruitment of neutrophils) 



observed in cystic fibrosis (CF) (NADE90) . 

Inappropriate NE activity is very harmful and to stop 
the progression of emphysema or to alleviate the 
symptoms of CF, an inhibitor of very high affinity is 
needed. The inhibitor must be very specific to NE lest 
it inhibit other vital serine proteases or esterases. 
Nadel (NADE90 ) has suggested that onset of excess 
secretion is initiated by 10" 10 M NE; thus, the inhibitor 
must reduce the concentration of free NE to well below 
this level. Thus human neutrophil elastase is a 

preferred target and a highly stable protein is a 
preferred IPBD. In particular, BPTI, ITI-D1, or another 
BPTI homologue is a preferred IPBD for development of an 
inhibitor to HNE . Other preferred IPBDs for making an 
inhibitor to HNE include CMTI-III, SLPI , Eglin, a- 
conotoxin GI , and Q Conotoxins . 

HNE is not the only serine protease for which an 
inhibitor would be valuable. Works concerning uses of 
protease inhibitors and diseases thought to result from 
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inappropriate protease activity include: NADE87, REST88, 
SOMM9 0 , and SOMM8 9 . Tryptase and chymase may be 

involved in asthma, see FRAN8 9 and VAND8 9. There are 
reports that suggest that Proteinase 3 (also known as 
p2 9) is as important or even more important than HNE ; 
see NILE89, ARNA9 0 , KAOR88, CAMP90, and GUPT90 . 
Cathepsin G is another protease that may cause disease 
when present in excess; see FERR90 , PETE8 9 , SALV87, and 
SOMM90. These works indicate that a problem exists and 
that blocking one or another protease might well 
alleviate a disease state. Some of the cited works 
report inhibitors having measurable affinity for a 
target protease, but none report truly excellent 
inhibitors that have Ka in the range of 10" 12 M as may be 
obtained by the method of the present invention. The 
same IPBDs used for HNE can be used for any serine 
protease . 

The present invention is not, however, limited to 
any of the above- identified target materials. The only 
limitation is that the target material be suitable for 
affinity separation . 

A supply of several milligrams of pure target 
material is desired. With HNE (as discussed in Examples 
II and III) , 400 /xg of enzyme is used to prepare 200 jxl 
of ReactiGel beads. This amount of beads is sufficient 
for as many as 40 fractionations. Impure target 

material could be used, but one might obtain a protein 
that binds to a contaminant instead of to the target . 



212 



The following information about the target material 
is highly desirable: 1) stability as a function of 
temperature, pH, and ionic strength, 2) stability with 
respect to chaotropes such as urea or guanidinium CI, 3) 
pi, 4) molecular weight, 5) require ments for prosthetic 
groups or ions, such as haem or Ca +2 , and 6) proteolytic 
activity, if any. It is also potentially useful to 
know: 1) the target's sequence, if the target is a 
macromolecule , 2) the 3D structure of the target, 3) 
enzymatic activity, if any, and 4) toxicity, if any. 

The user of the present invention specifies certain 
parameters of the intended use of the binding protein: 
1) the acceptable temperature range, 2) the acceptable 
pH range, 3) the acceptable concentrations of ions and 
neutral solutes, and 4) the maximum acceptable 
dissociation constant for the target and the SBD : 

K T = [Target] [SBD] / [Target : SBD] . 
In some cases, the user may require discrimination 
between T, the target, and N, some non- target. Let 

K T = [T] [SBD] / [T: SBD] , and 

K N = [N] [SBD] / [N: SBD] , 

then K T /K N = ( [T] [N : SBD] ) / ( [N] [T : SBD] ) . 

The user then specifies a maximum acceptable value for 
the ratio K t /Kn- 

The target material preferably is stable under the 
specified conditions of pH, temperature, and solution 
conditions . 
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If the target material is a protease, one considers 

the following points: 

1) a highly specific protease can be treated like any 

other target, 

2 ) a general protease, such as subtilisin, may degrade 
the OSPs of the GP including OSP-PBDs; there are 
several alternative ways of dealing with general 
proteases, including: a) use a protease inhibitor 
as PPBD so that the SBD is an inhibitor of the 
protease, b) a chemical inhibitor may be used to 
prevent proteolysis (e^ phenylmethyl f luorosulf ate 

(PMFS) that inhibits serine proteases), O one or 
more active-site residues may be mutated to create 
an inactive protein (e^ a serine protease in 
which the active serine is mutated to alanine) , or 
d ) one or more active-site amino-acids of the 
protein may be chemically modified to destroy the 
catalytic activity (e^ a serine protease in which 
the active serine is converted to anhydroserine) , 

3) SBDs selected for binding to a protease need not be 
inhibitors;. SBDs that happen to inhibit the 
protease target are a fairly small subset of SBDs 
that bind to the protease target, 

4) the more we modify the target protease, the less 
like we are to obtain an SBD that inhibits the 
target protease, and 

5) if the user requires that the SBD inhibit the 
target protease, then the active site of the target 
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protease must not be modified any more than 
necessary; inactivation by mutation or chemical 
modification are preferred methods of inactivation 
and a protein protease inhibitor becomes a prime 
candidate for IPBD. For example, BPTI has been 
mutated, by the methods of the present invention, 
to bind to proteases other than trypsin. 
Example III - VI disclose that uninhibited serine 
proteases may be used as targets quite successfully and 
that protein protease inhibitors derived from BPTI and 
selected for binding to these immobilized proteases are 
excellent inhibitors . 

V.F. Immobilization or Labeling of Target Material 

For chromatography, FACS, or electrophoresis there 
may be a need to covalently link the target material to 
a second chemical entity. For chromatography the second 
entity is a matrix, for FACS the second entity is a 
fluorescent dye, and for electrophoresis the second 
entity is a strongly charged molecule. In many cases, 
no coupling is required because the target material 
already has the desired property of: a) immobility, b) 
fluorescence, or c) charge. In other cases, chemical or 
physical coupling is required. 

Various means may be used to immobilize or label 
the target materials. The means of immobilization or 
labeling is, in part, determined by the nature of the 
target. In particular, the physical and chemical nature 
of the target and its functional groups of the target 
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material determine which types of immobilization 
reagents may be most easily used. 

'For the purpose of selecting an immobilization 
method, it may be more helpful to classify target 
materials as follows: (a) solid, whether crystalline or 
amorphous, and insoluble in an aqueous solvent ( e.g. , 
many minerals, and fibrous organics such as cellulose 
and silk) ; (b) solid, whether crystalline or amorphous, 
and soluble in an aqueous solvent; (c) liquid, but 
insoluble in aqueous phase ( e.g. , 2,3,3- 

trimethyldecane) ,- or (d) liquid, and soluble in aqueous 
media . 

It is not necessary that the actual target material 
be used in preparing the immobilized or labeled analogue 
that is to be used in affinity separation; rather, 
suitable reactive analogues of the target material may 
be more convenient. If 2,3,3- trimethyldecane were the 
target material, for example, then 2 , 3 , 3-trimethyl-10- 
aminodecane would be far easier to immobilize than the 
parental compound. Because the latter compound is 
modified at one end of the chain, it retains almost all 
of the shape and charge attributes that differentiate 
the former compound from other alkanes . 

Target materials that do not have reactive 
functional groups may be immobilized by first creating a 
reactive functional group through the use of some 
powerful reagent, such as a halogen. For example, an 
alkane can be immobilized for affinity by first 
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halogenating it and then reacting the halogenated 
derivative with an immobilized or immobilizable amine. 

In some cases, the reactive groups of the actual 
target material may occupy a part on the target molecule 
that is to be left undisturbed. In that case, 

additional functional groups may be introduced by 
synthetic chemistry. For example, the most reactive 
groups in cholesterol are on the steroid ring system, 
viz , -OH and >C=C. We may wish to leave this ring 
system as it is so that it binds to the novel binding 
protein. In this case, we prepare an analogue having a 
reactive group attached to the aliphatic chain (such as 
26-aminocholesterol) and immobilize this derivative in 
a manner appropriate to the reactive group so attached. 

Two very general methods of immobilization are 
widely used. The first is to biotinylate the compound 
of interest and then bind the biotinylated derivative to 
immobilized avidin. The second method is to generate 
antibodies to the target material, immobilize the anti 
bodies by any of numerous methods, and then bind the 
target material to the immobilized antibodies. Use of 
antibodies is more appropriate for larger target 
materials; small targets (those comprising, for example, 
ten or fewer non-hydrogen atoms) may be so completely 
engulfed by an antibody that very little of the target 
is exposed in the target -antibody complex. 

Non-covalent immobilization of hydrophobic 

molecules without resort to antibodies may also be used. 
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A compound, such as 2 , 3 , 3 - trimethyldecane is blended 
with a matrix precursor, such as sodium alginate, and 
the mixture is extruded into a hardening solution. The 
resulting beads will have 2,3,3- trimethyldecane 
dispersed throughout and exposed on the surface. 

Other immobilization methods depend on the presence 
of particular chemical functionalities. A polypeptide 
will present -NH 2 (N- terminal ; Lysines), - COOH (C- 
terminal ; Aspartic Acids; Glutamic Acids), -OH (Serines; 
Threonines; Tyrosines) , and -SH (Cysteines) . A 
polysaccharide has free -OH groups, as does DNA, which 
has a sugar backbone . 

The following table is a nonexhaust ive review of 
reactive functional groups and potential immobilization 
reagents : 



Group Reagent 

R-NH 2 

Derivatives of 2,4,6- trinitro 
benzene sulfonates (TNBS) , 
(CREI84, p. 11) 

R-NH 2 

Carboxylic acid anhydrides, 
e.g. derivatives of succinic 
anhydride, maleic anhydride, 
citraconic anhydride (CREI84, 
p. 11) 

R-NH 2 

Aldehydes that form reducible 
Schiff bases (CREI84, p. 12) 

guanido 

eye 1 ohexanedi one de r i va t ive s 
(CREI84, p. 14) 

R-C0 2 H 
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R-C0 2 - 
R-OH 
Aryl-OH 
Indole ring 

R-SH 

R-SH 

R-SH 

R-SH 

Thiol ethers 
Ketones 

Aldehydes 
R-SO3H 

R-PO3H 

CC double bonds 



Diazo cmpds (CREI84, p. 10) 

Epoxides (CREI84, p. 10) 

Carboxylic acid anhydrides 

Carboxylic acid anhydrides 

Benzyl halide and sulfenyl 
halides (CREI84, p. 19) 

N-alkylmaleimides (CREI84 , 
p. 21) 

ethylene imine derivatives 
(CREI84, p. 21) 

Aryl mercury compounds , 
(CREI84, P. 21) 

Disulfide reagents, (CREI84, 
p. 23) 

Alkyl iodides, (CREI84, p. 20) 

Make Schiff 's base and reduce 
with NaBH 4 . (CREI84, p. 12- 13) 

Oxidize to COOH, vide supra . 

Convert to R-S0 2 C1 and react 
with immobilized alcohol or 
amine . 

Convert to R-P0 2 C1 and react 
with immobilized alcohol or 
amine . 

Add HBr and then make amine or 
thiol . 
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The next table identifies the reactive groups 
number of potential targets. 



Reactive groups or 

_Con ] P3md_JIte i ^ 

prostaglandin E2 



of a 



(2893,1251) 
aspartame (861,13 2) 
haem (4558, 732) 
bilirubin (1235,189) 
morphine (6186,988) 

codeine (2459,384) 



-OH, -keto, -COOH, C=C 

-NH 2 , -COOH, -COOCH3 

vinyl, -COOH, Fe 

vinyl, -COOH, keto, -NH- 

-0H, -C=C-, reactive phenyl 
ring 

-OH, -C=C-, reactive phenyl 
ring 



dichlorodiphenyitrichlorethane^BS^^ 

aliphatic chlorine 

benzo (a)pyrene 
(1113,172) 



actinomycin 
(2804,441) 

cellulose 

hydroxy 1 apatite 

cholesterol (2204,341) 



[Chlorinate- >amine, or make 
sulfonate -> Aryl-S0 2 Cl] 

aryl-NH 2 , -OH 
self immobilized 
self immobilized 
-OH, >C=C- 



.Hote: Xt«* and page refer to The Merck Index, 11th 
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Edition . 

The extensive literature on affinity chromatography 
and related techniques will provide further examples. 

Matrices suitable for use as support materials 
include polystyrene, glass, agarose and other chromato 
graphic supports, and may be fabricated into beads, 
sheets, columns, wells, and other forms as desired. 
Suppliers of support material for affinity 
chromatography include: Applied Protein Technologies 
Cambridge, MA; Bio-Rad Laboratories, Rockville Center, 
NY; Pierce Chemical Company, Rockford, IL. Target 
materials are attached to the matrix in accord with the 
directions of the manufacturer of each matrix 
preparation with consideration of good presentation of 
the target . 

Early in the selection process , relatively high 
concentrations of target materials may be applied to the 
matrix to facilitate binding; target concentrations may 
subsequently be reduced to select for higher affinity 
SBDs . 

V.G. Elution of Lower Affinity PBD-Bearing Genetic 
Packages 

The population of GPs is applied to an affinity 
matrix under conditions compatible with the intended use 
of the binding protein and the population is 
fractionated by passage of a gradient of some solute 
over the column. The process enriches for PBDs having 
affinity for the target and for which the affinity for 
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the target is least affected by the eluants used. The 
enriched fractions are those containing viable GPs that 
elute from the column at greater concentration of the 
eluant . 

The eluants preferably are capable of weakening 
noncovalent interactions between the displayed PBDs and 
the immobilized target material. Preferably, the 

eluants do not kill the genetic package; the genetic 
message corresponding to successful mini-proteins is 
most conveniently amplified by reproducing the genetic 
package rather than by in vitro procedures such as PCR. 
The list of potential eluants includes salts (including 
Na+, NH 4 +, Rb+, S0 4 --, H 2 P0 4 -, citrate, K+ , Li+, Cs+, 

HSO4-, CO3--, Ca++, Sr++, C1-, P0 4 , HC0 3 -/ Mg++, Ba++, 

Br-, HPO4-- and acetate), acid, heat, compounds known to 
bind the target, and soluble target material (or 
analogues thereof) . 

Because bacteria continue to metabolize during 
affinity separation, the choice of buffer components is 
more restricted for bacteria than for bacteriophage or 
spores. Neutral solutes, such as ethanol , acetone, 
ether, or urea, are frequently used in protein 
purification and are known to weaken non-covalent 
interactions between proteins and other molecules,. Many 
of these species are, however, very harmful to bacteria 
and bacteriophage. Urea is known not to harm M13 up to 
8 M. Bacterial spores, on the other hand, are 

impervious to most neutral solutes. Several affinity 
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separation passes may be made within a single round of 
variegation. Different solutes may be used in different 
analyses, salt in one, pH in the next, etc . 

Any ions or cofactors needed for stability of PBDs 
(derived from IPBD) or target are included in initial 
and elution buffers at appropriate levels. We first 
remove GP(PBD)s that do not bind the target by washing 
the matrix with the initial buffer. We determine that 
this phase of washing is complete by plating aliquots of 
the washes or by measuring the optical density (at 260 
nm or 280 nm) . The matrix is then eluted with a 
gradient of increasing: a) salt, b) [H+] (decreasing 
pH) , c) neutral solutes, d) temperature (increasing or 
decreasing), or e) some combination of these factors. 
The solutes in each of the first three gradients have 
been found generally to weaken non-covalent interactions 
between proteins and bound molecules. Salt is a 
preferred solute for gradient formation in most cases. 
Decreasing pH is also a highly preferred eluant . In 
some cases, the preferred matrix is not stable to low pH 
so that salt and urea are the most preferred reagents. 
Other solutes that generally weaken non-covalent 
interaction between proteins and the target material of 
interest may also be used. 

The uneluted genetic packages contain DNA encoding 
binding domains which have a sufficiently high affinity 
for the target material to resist the elution 
conditions. The DNA encoding such successful binding 
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domains may be recovered in a variety of ways. 
Preferably, the bound genetic packages are simply eluted 
by means of a change in the elution conditions. 
Alternatively, one may culture the genetic package in 
situ , or extract the target-containing matrix with 
phenol (or other suitable solvent) and amplify the DNA 
by PCR or by recombinant DNA techniques. Additionally, 
if a site for a specific protease has been engineered 
into the display vector, the specific protease is used 
to cleave the binding domain from the GP . 

V.H. Optimization of Affinity Chromatography Separation: 

For linear gradients, elution volume and eluant 
concentration are directly related. Changes in eluant 
concentration cause GPs to elute from the column. 
Elution volume, however, is more easily measured and 
specified. It is to be understood that the eluant 
concentration is the agent causing GP release and that 
an eluant concentration can be calculated from an 
elution volume and the specified gradient. 

Using a specified elution regime, we compare the 
elution volumes of GP(IPBD)s with the elution volumes of 
wtGP on affinity columns supporting AfM(IPBD). Com 
parisons are made at various: a) amounts of IPBD/GP, b) 
densities of Af M ( I PBD) / (volume of matrix) (DoAMoM) , c) 
initial ionic strengths, d) elution rates, e) amounts of 
GP/ (volume of support), f) pHs , and g) temperatures, 
because these are the parameters most likely to affect 
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the sensitivity and efficiency of the separation. We 
then pick those conditions giving the best separation. 

We do not optimize pH or temperature; rather we 
record optimal values for the other parameters for one 
or more values of pH and temperature . The pH used must 
be within the range of pH for which GP(I.PBD) binds the 
AfM(IPBD) that is being used in this step. The 
conditions of intended use specified by the user may 
include a specification of pH or temperature. If pH is 
specified, then pH will not be varied in eluting the 
column. Decreasing pH may, however, be used to liberate 
bound GPs from the matrix. Similarly, if the intended 
use specifies a temperature, we will hold the affinity 
column at the specified temperature during elution, but 
we might vary the temperature during recovery. If the 
intended use specifies the pH or temperature, then we 
prefer that the affinity separation be optimized for all 
other parameters at the specified pH and temperature. 

In the optimization devised in this step, we 
preferably use a molecule known to have moderate 
affinity for the IPBD (Kd in the range 10" 6 M to 10" 8 M) , 
for the following reason. When populations of 

GP (vgPBD) s are fractionated, there will be roughly three 
subpopulations : a) those with no binding, b) those that 
have some binding but can be washed off with high salt 
or low pH, and c) those that bind very tightly and are 
most easily rescued in situ . We optimize the parameters 
to separate (a) from (b) rather than (b) from (c) . Let 
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PBD W be a PBD having weak binding to the target and PBD S 
be a PBD having strong binding. Higher DoAMoM might, 
for example, favor retention of GP(PBD W ) but also make it 
very difficult to elute viable GP(PBD S ). We will 
optimize the affinity separation to retain GP(PBD W ) 
rather than to allow release, of GP(PBD S ) because a 
tightly bound GP(PBD S ) can be rescued by in situ growth. 
If we find that DoAMoM strongly affects the elution 
volume, then in part III we may reduce the amount of 
target on the affinity column when an SBD has been found 
with moderately strong affinity (Kd on the order of 10" 7 
M) for the target. 

In case the promoter of the osp- ipbd gene is not 
regulated by a chemical inducer, we optimize DoAMoM, the 
elution rate, and the amount of GP/volume of matrix. If 
the optimized affinity separation is acceptable, we 
proceed. If not, we develop a means to alter the amount 
of IPBD per GP. Among GPs considered in the present 
invention, this case could arise only for spores because 
regulatable promoters are available for all other 
systems . 

If the amount of IPBD/spore is too high, we could 
engineer an operator site into the osp -ipbd gene. We 
choose the operator sequence such that a repressor 
sensitive to a small diffusible inducer recognizes the 
operator. Alternatively, we could alter the Shine- 
Dalgarno sequence to produce a lower homology with 
consensus Shine-Dalgarno sequences. If the amount of 
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IPBD/spore is too low, we can introduce variability into 
the promoter or Shine-Dalgarno sequences and screen 
colonies for higher amounts of IPBD/spore. 

In this step, we measure elution volumes of 
genetically pure GPs that elute from the affinity matrix 
as sharp bands that can be detected by UV absorption. 
Alternatively, samples from effluent fractions can be 
plated on suitable medium (cells or spores) or on 
sensitive cells (phage) and colonies or plaques counted. 

Several values of IPBD/GP, DoAMoM, elution rates, 
initial ionic strengths, and loadings should be 
examined. The following is only one of many ways in 
which the affinity separation could be optimized. We 
anticipate that optimal values of IPBD/GP and DoAMoM 
will be correlated and therefore should be optimized 
together. The effects of initial ionic strength, 

elution rate, and amount of GP/ (matrix volume) are 
unlikely to be strongly correlated, and so they can be 
optimized independently. 

For each set of parameters to be tested, the column 
is eluted in a specified manner. For example, we may 
use a regime called Elution. Regime 1: a KCl gradient 
runs from lOmM to maximum allowed for the GP(IPBD) 
viability in 100 fractions of 0.05 V v , followed by 20 
fractions of 0.05 V v at maximum allowed KCl; pH of the 
buffer is maintained at the specified value with a 
convenient buffer such as phosphate, Tris, or MOPS. 
Other elution regimes can be used; what is important is 
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that the conditions of this optimization, be similar to 
the conditions that are used in Part III for selection 
for binding to target and recovery of GPs from the 
chromatographic system. 

When the osp-ipbd gene is regulated by [XINDUCE] , 
IPBD/GP can be controlled by varying [XINDUCE] . Appro 
priate values of [XINDUCE] depend on the identity of 
[XINDUCE] and the promoter; if, for example, XINDUCE is 
isopropylthiogalactoside (IPTG) and the promoter is 
lacUVS, then [IPTG] = 0, 0.1 uM, 1.0 uM, 10.0 uM, 100.0 
uM, and 1.0 mM would be appropriate levels to test. The 
range of variation of [XINDUCE] is extended until an 
optimum is found or an acceptable level of expression is 
obtained . 

DoAMoM is varied from the maximum that the matrix 
material can bind to 1% or 0.1% of this level in appro 
priate steps. We anticipate that the efficiency of 
separation will be a smooth function of DoAMoM so that 
it is appropriate to cover a wide range of values for 
DoAMoM with a coarse grid and then explore the 
neighborhood of the approximate optimum with a finer 
grid. 

Several values of initial ionic strength are 
tested, such as 1.0 mM, 5.0 mM, 10.0 mM and 20.0 mM. 
Low ionic strength favors binding between oppositely 
charged groups, but could also cause GP to precipitate. 

The elution rate is varied, by successive factors 
of 1/2, from the maximum attainable rate to 1/16 of this 
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value. If the lowest elution rate tested gives the best 
separation, we test lower elution rates until we find an 
optimum or adequate separation. 

The goal of the optimization is to obtain a sharp 
transition between bound and unbound GPs, triggered by 
increasing salt or decreasing pH or a combination of 
both. This optimization need be performed only: a) for 
each temperature to be used, b) for each pH to be used, 
and c) when a new GP(IPBD) is created. 

V.I. Measuring the sensitivity of affinity separation: 

Once the values of IPBD/GP, DoAMoM, initial ionic 
strength, elution rate, and amount of GP/ (volume of 
affinity support) have been optimized, we determine the 
sensitivity of the affinity separation (C sen si) by the 
following procedure that measures the minimum quantity 
of GP(IPBD) that can be detected in the presence of a 
large excess of wtGP. The user chooses a number of 
separation cycles, denoted N chrom/ that will be performed 
before an enrichment is abandoned; preferably, N chr om is 
in the range 6 to 10 and N C hrom must be greater than 4. 
Enrichment can be terminated by isolation of a desired 
GP(SBD) before N chrom passes. 

The measurement of sensitivity is significantly 
expedited if GP(IPBD) and wtGP carry different 
selectable markers because such markers allow easy 
identification of colonies obtained by plating fractions 
obtained from the chromatography column. For example, 
if wtGP carries kanamycin resistance and GP(IPBD) 
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carries ampicillin resistance, we can plate fractions 
from a column on non- selective media suitable for the 
GP . Transfer of colonies onto ampicillin- or kanamycin- 
containing media will determine the identity of each 
colony . 

Mixtures of GP(IPBD) and wtGP are prepared in the 
ratios of l:Vii m , where Vn m ranges by an appropriate 
factor ( e.g. 1/10) over an appropriate range, typically 
10 11 through 10 4 . Large values of Vi im are tested first; 
once a positive result is obtained for one value of Vii m , 
no smaller values of Vi ira need be tested. Each mixture 
is applied to a column supporting, at the optimal 
DoAMoM, an AfM(IPBD) having high affinity for IPBD and 
the column is eluted by the specified elution regime, 
such as Elution Regime 1. The last fraction that 
contains viable GPs and an inoculum of the column matrix 
material are cultured. If GP(IPBD) and wtGP have 
different selectable markers, then transfer onto 
selection plates identifies each colony. If GP(IPBD) 
and wtGP have no selectable markers or the same 
selectable markers, then a number ( e.g. 32) of GP clonal 
isolates are tested for presence of IPBD. If IPBD is 
not detected on the surface of any of the isolated GPs, 
then GPs are pooled from: a) the last few ( e.g. 3 to 5) 
fractions that contain viable GPs, and b) an inoculum 
taken from the column matrix. The pooled GPs are 
cultured and passed over the same column and enriched 
for GP(IPBD) in the manner described. This process is 
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repeated until N C hrom passes have been performed, or until 
the IPBD has been detected on the GPs . If GP(IPBD) is 
not detected after N ch rom passes, V lim is decreased and the 
process is repeated. 

Once a value for Vii m is found that allows recovery 
of GP(IPBD)s, the factor by which Vii m is varied is 
reduced and additional values are tested until Vn m is 
known to within a factor of two. 

Csensi equals the highest value of Vii m for which the 
user can recover GP (IPBD) within N C hrom passes . The 
number of chromatographic cycles (K cyc ) that were needed 
to isolate GP(IPBD) gives a rough estimate of C e ff; C e ff 
is approximately the Kc yc th root of Vlim: 

Ceff « exp{ log e (Vi im ) /Kcyc } 

For example, if Vi im were 4.0 x 10 8 and three 
separation cycles were needed to isolate GP(IPBD), then 
Ceff « 736. 

V.J. Measuring the efficiency of separation : 

To determine C e ff more accurately, we determine the 
ratio of GP (IPBD) /wtGP loaded onto an Af M (IPBD) column 
that yields approximately equal amounts of GP(IPBD) and 
wtGP after elution. We prepare mixtures of GP(IPBD) and 
wtGP in ratios GP ( IPBD) : wtGP :: 1:Q; we start Q at 
twenty times the approximate C e ff found above. A 1:Q 
mixture of GP(IPBD) and wtGP is applied to a AfM(IPBD) 
column and eluted by the specified elution regime, such 
as Elution Regime 1. A sample of the last fraction that 
contains viable GPs is plated at a dilution that gives 
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well separated colonies or plaques. The presence of 
IPBD or the osp-ipbd gene in each colony or plaque can 
be determined by a number of standard methods, 
including: a) use of different selectable markers, b) 
nitrocellulose filter lift of GPs and detection with 
Af M (IPBD) * (AUSU87), or c) nitrocellulose filter lift of 
GPs and detection with radiolabeled DNA that is 
complementary to the osp-ipbd gene (AUSU87) . Let F be 
the fraction of GP(IPBD) colonies found in the last 
fraction containing viable GPs. When a Q is found such 
that .20 < F< .80, then 

C eff = Q * F. 

If F < 0.2, then we reduce Q by an appropriate 
factor ( e.g. 1/10) and repeat the procedure. If 
F > 0.8, then we increase Q by an appropriate factor 
( e.g. 2) and repeat the procedure. 

V.K. Reducing selection due to non-specific binding: 

When affinity chromatography is used for separating 
bound and unbound GPs, we may reduce non- specific 
binding of GP(PBD)s to the matrix that bears the target 
in the following ways: 

1) we treat the column with blocking agents such as 
genetically defective GPs or a solution of protein 
before the population of GP(vgPBD)s is 
chromatographed , and 

2) we pass the population of GP(vgPBD)s over a matrix 
containing no target or a different target from the 
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same class as the actual target prior to affinity 

chromatography . 
Step (1) above saturates any non-specific binding that 
the affinity matrix might show toward wild-type GPs or 
proteins in general; step (2) removes components of our 
population that exhibit non-specific binding to the 
matrix or to molecules of the same class as the target . 
If the target were horse heart myoglobin, for example, a 
column supporting bovine serum albumin could be used to 
trap GPs exhibiting PBDs with strong non-specific 
binding to proteins. If cholesterol were the target, 
then a hydrophobic compound, such as p- 

tertiarybutylbenzyl alcohol, could be used to remove GPs 
displaying PBDs having strong non-specific binding to 
hydrophobic compounds. It is anticipated that PBDs that 
fail to fold or that are prematurely terminated will be 
non-specif ically sticky. These sequences could 

outnumber the PBDs having desirable binding properties. 
Thus, the capacity of the initial column that removes 
indiscriminately adhesive PBDs should be greater ( e.g. 5 
fold greater) than the column that supports the target 
molecule . 

Variation in the support material (polystyrene, 
glass, agarose, cellulose, etc . ) in analysis of clones 
carrying SBDs is used to eliminate enrichment for 
packages that bind to the support material rather than 
the target . 
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FACs may be used to separate GPs that bind 
fluorescent labeled target. We discriminate against 
artif actual binding to the fluorescent label by using 
two or more different dyes, chosen to be structurally 
different. GPs isolated using target labeled with a 
first dye are cultured. These GPs are then tested with 
target labeled with a second dye. 

Electrophoretic affinity separation uses unaltered 
target so that only other ions in the buffer can give 
rise to artif actual binding. Artif actual binding to the 
gel material gives rise to retardation independent of 
field direction and so is easily eliminated. 

A variegated population of GPs will have a variety 
of charges . The following 2D electrophoretic procedure 
accommodates this variation in the population. First 
the variegated population of GPs is electrophoresed in a 
gel that contains no target material . The 
electrophoresis continues until the GP s are distributed 
along the length of the lane. The gels described by 
Sewer for phage are very low in agarose and lack 
mechanical stability. The target-free lane in which the 
initial electrophoresis is conducted is separate from a 
square of gel that contains target material by a 
removable baffle. After the first pass, the baffle is 
removed and a second electrophoresis is conducted at 
right angles to the first. GPs that do not bind target 
migrate with unaltered mobility while GP s that do bind 
target will separate from the majority that do not bind 
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target. A diagonal line of non-binding GPs will. form. 
This line is excised and discarded. Other parts of the 
gel are dissolved and the GPs cultured. 
V.L. Isolation of GP(PBD)s with binding- to- target 
phenotypes : 

The harvested packages are now enriched for the 
binding- to- target phenotype by use of affinity 
separation involving the target material immobilized on 
an affinity matrix. Packages that fail to bind to the 
target material are washed away. If the packages are 
bacteriophage or endospores, it may be desirable to 
include a bacteriocidal agent, such as azide, in the 
buffer to prevent bacterial growth. The buffers used in 
chromatography include: a) any ions or other solutes 
needed to stabilize the target, and b) any ions or other 
solutes needed to stabilize the PBDs derived from the 
IPBD. 

V.M. Recovery of packages: 

Recovery of packages that display binding to an 
affinity column may be achieved in several ways, 
including : 

1) collect fractions eluted from the column with a 
gradient as described above; fractions eluting 
later in the gradient contain GPs more enriched for 
genes encoding PBDs with high affinity for the 
column, 

2) elute the column with the target material in 
soluble form, 
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3) flood the matrix with a nutritive medium and grow 
the desired packages in situ , 

4) remove parts of the matrix and use them to 
inoculate growth medium, 

5) chemically or enzymatically degrade the linkage 
holding the target to the matrix so that GPs still 
bound to target are eluted, or 

6) degrade the packages and recover DNA with phenol or 
other suitable solvent; the recovered DNA is used 
to transform cells that regenerate GPs. 

It is possible to utilize combinations of these methods. 
It should be remembered that what we want to recover 
from the affinity matrix is not the GPs per se, but the 
information in them. Recovery of viable GPs is very 
strongly preferred, but recovery of genetic material is 
essential. If cells, spores, or virions bind 

irreversibly to the matrix but are not killed, we can 
recover the information through in situ cell division, 
germination, or infection respectively. Proteolytic 
degradation of the packages and recovery of DNA is not 
preferred. 

Although degradation of the bound GPs and recovery 
of genetic material is a possible mode of operation, 
inadvertent inactivation of the GPs is very deleter 
ious. It is preferred that maximum limits for solutes 
that do not inactivate the GPs or denature the target or 
the column are determined. If the affinity matrices are 
expendable, one may use conditions that denature the 
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column to elute GPs; before the target is denatured, a 
portion of the affinity matrix should be removed for 
possible use as an inoculum. As the GPs are held 
together by protein-protein interactions and other non- 
covalent molecular interactions, there will be cases in 
which the molecular package will bind so tightly to the 
target molecules on the affinity matrix that the GPs can 
not be washed off in viable form. This will only occur 
when very tight binding has been obtained. In these 
cases, methods (3) through (5) above can be used to 
obtain the bound packages or the genetic messages from 
the affinity matrix. 

It is possible, by manipulation of the elution 
conditions, to isolate SBDs that bind to the target at 
one pH (pH b ) but not at another pH (pH Q ) - The population 
is applied at pH b and the column is washed thoroughly at 
pH b . The column is then eluted with buffer at pH G and 
GPs that come off at the new pH are collected and 
cultured. Similar procedures may be used for other 
solution parameters, such as temperature. For example, 
GP(vgPBD)s could be applied to a column supporting 
insulin. After eluting with salt to remove GPs with 
little or no binding to insulin, we elute with salt and 
glucose to liberate GPs that display PBDs that bind 
insulin or glucose in a competitive manner. 
V.N. Amplifying the Enriched Packages 

Viable GPs having the selected binding trait are 
amplified by culture in a suitable medium, or, in the 
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case of phage, infection into a host so cultivated. If 
the GPs have been inactivated by the chromatography, the 
OCV carrying the osp - pbd gene are recovered from the GP, 
and introduced into a new, viable host. 

V.O. Determining whether further enrichment is needed: 

The probability of isolating a GP with improved 
binding increases by C e ff with each separation cycle. 
Let N be the number of distinct amino-acid sequences 
produced by the variegation. We want to perform K 
separation cycles before attempting to isolate an SBD, 
where K is such that the probability of isolating a 
single SBD is 0.10 or higher. 

K = the smallest integer>= logi 0 (0.10 N) /logio (C eff ) 
For example, if N were 1.0 *10 7 and C e ff = 6.31 *10 2 , < 
then log 10 (1.0-10 6 ) /logi 0 (6 . 31 • 10 2 ) = 6.0000/2.8000 
2.14. Therefore we would attempt to isolate SBDs after 
the third separation cycle. After only two separation 
cycles, the probability of finding an SBD is 

(6.31 x 102)2/(1.0 x 107) = .04 
and attempting to isolate SBDs might be profitable. 

Clonal isolates from the last fraction eluted which 
contained any viable GPs, as well as clonal isolates 
obtained by culturing an inoculum taken from the 
affinity matrix, are cultured in a growth step that is 
similar to that described previously. Other fractions 
may be cultured too. If K separation cycles have been 
completed, samples from a number, e.g. 32, of these 
clonal isolates are tested for elution properties on the 
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{target} column. If none of the isolated, genetically 
pure GPs show improved binding to target, or if K cycles 
have not yet been completed, then we pool and culture, 
in a manner similar to the manner set forth previously, 
the GPs from the last few fractions eluted that 
contained viable GPs and from the GPs obtained by 
culturing an inoculum taken from the column matrix. We 
then repeat the enrichment procedure described above. 
This cyclic enrichment may continue N C hrom passes or until 
an SBD is isolated. 

If one or more of the isolated GPs has improved 
retention on the {target} column, we determine whether 
the retention of the candidate SBDs is due to affinity 
for the target material as follows. A second column is 
prepared using a different support matrix <image> 

</image>material bound at the optimal density. The 
elution volumes, under the same elution conditions as 
used previously, of candidate GP (SBD) s are compared to 
each other and to GP(PPBD of this round) . If one or 
more candidate GP(SBD)s has a larger elution volume than 
GP(PPBD of this round), then we pick the GP(SBD) having 
the highest elution volume and proceed to characterize 
the population. If none of the candidate GP(SBD)s has 
higher elution volume than GP(PPBD of this round), then 
we pool and culture, in a manner similar to the manner 
used previously, the GPs from the last few fractions 
that contained viable GPs and the GPs obtained by 
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culturing an inoculum taken from the column matrix. We 
then repeat the enrichment procedure . 

If all of the SBDs show binding that is superior to 
PPBD of this round, we pool and culture the GPs from the 
last fraction that contains viable GPs and from the 
inoculum taken from the column. This population is re- 
chromatographed at least one pass to fractionate further 
the GPs based on K<i. 

If an RNA phage were used as GP, the RNA would 
either be cultured with the assistance of a helper phage 
or be reverse transcribed and the DNA amplified. The 
amplified DNA could then be sequenced or subcloned into 
suitable plasmids . 

V.P. Characterizing the Putative SBDs: 

We characterize members of the population showing 
desired binding properties by genetic and biochemical 
methods. We obtain clonal isolates and test these 
strains by genetic and affinity methods to determine 
genotype and phenotype with respect to binding to 
target. For several genetically pure isolates that show 
binding, we demonstrate that the binding is caused by 
the artificial chimeric gene by excising the osp-sbd 
gene and crossing it into the parental GP. We also 
ligate the deleted backbone of each GP from which the 
osp-sbd is removed and demonstrate that each backbone 
alone cannot confer binding to the target on the GP. We 
sequence the osp-sbd gene from several clonal isolates. 
Primers for sequencing are chosen from the DNA flanking 
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the osp-ppbd gene or from parts of the osp-ppbd gene 
that are not variegated. 

The present invention is not limited to a single 
method of determining protein sequences, and reference 
in the appended claims to determining the amino acid 
sequence of a domain is intended to include any 
practical method or combination of methods, whether 
direct or indirect. The preferred method, in most 
cases, is to determine the sequence of the DNA that 
encodes the protein and then to infer the amino acid 
sequence. In some cases, standard methods of protein- 
sequence determination may be needed to detect post- 
translational processing . 

The present invention is not limited to a single 
method of determining the sequence of nucleotides (nts) 
in DNA subsequences. In the preferred embodiment, 

plasmids are isolated and denatured in the presence of a 
sequencing primer, about 20 nts long, that anneals to a 
region adjacent, on the 5' side, to the region of 
interest. This plasmid is then used as the template in 
the four sequencing reactions with one dideoxy substrate 
in each. Sequencing reactions, agarose gel 

electrophoresis, and polyacrylamide gel electrophoresis 
(PAGE) are performed by standard procedures (AUSU87) . 

For one or more clonal isolates, we may subclone 
the sbd gene fragment, without the osp fragment, into an 
expression vector such that each SBD can be produced as 
a free protein. Because numerous unique restriction 
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sites were built into the inserted domain, it is easy to 
subclone the gene at any time. Each SBD protein is 
purified by normal means, including affinity chromato 
graphy. Physical measurements of the strength of 

binding are then made on each free SBD protein by one of 
the following methods: 1) alteration of the Stokes 
radius as a function of binding of the target material, 
measured by characteristics of elution from a molecular 
sizing column such as agarose, 2) retention of 
radiolabeled binding protein on a spun affinity column 
to which has been affixed the target material, or 3) 
retention of radiolabeled target material on a spun 
affinity column to which has been affixed the binding 
protein. The measurements of binding for each free SBD 
are compared to the corresponding measurements of 
binding for the PPBD. 

In each assay, we measure the extent of binding as 
a function of concentration of each protein, and other 
relevant physical and chemical parameters such as salt 
concentration, temperature, pH, and prosthetic group 
concentrations (if any) . 

In addition, the SBD with highest affinity for the 
target from each round is compared to the best SBD of 
the previous round (IPBD for the first round) and to the 
IPBD (second and later rounds) with respect to affinity 
for the target material. Successive rounds of 

mutagenesis and selection-through-binding yield 
increasing affinity until desired levels are achieved. 
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If we find that the binding is not yet sufficient, 
we decide which residues to vary next. If the binding 
is sufficient, then we now have a expression vector 
bearing a gene encoding the desired novel binding 
protein . 

V.Q. Joint selections: 

One may modify the affinity separation of the 
method described to select a molecule that binds to 
material A but not to material B. One needs to prepare 
two selection columns, one with material A and the other 
with material B. The population of genetic packages is 
prepared in the manner described, but before applying 
the population to A, one passes the population over the 
B column so as to remove those members of the population 
that have high affinity for B ("reverse affinity 
chromatography"). In the preceding specification, the 
initial column supported some other molecule simply to 
remove GP(PBD)s that displayed PBDs having 
indiscriminate affinity for surfaces. 

It may be necessary to amplify the population that 
does not bind to B before passing it over A. Amplifi 
cation would most likely be needed if A and B were in 
some ways similar and the PPBD has been selected for 
having affinity for A. The optimum order of interac 
tions might be determined empirically. For example, to 
obtain an SBD that binds A but not B, three columns 
could be connected in series: a) a column supporting 
some compound, neither A nor B, or only the matrix 
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material, b) a column supporting B, and c) a column 
supporting A. A population of GP (vgPBD) s is applied to 
the series of columns and the columns are washed with 
the buffer of constant ionic strength that is used in 
the application. The columns are uncoupled, and the 
third column is eluted with a gradient to isolate 
GP(PBD)s that bind A but not B. 

One can also generate molecules that bind to both A 
and B. In this case we can use a 3D model and mutate 
one face of the molecule in question to get binding to 

A. One can then mutate a different face to produce 
binding to B. When an SBD binds at least somewhat to 
both A and B, one can mutate the chain by Diffuse 
Mutagenesis to refine the binding and use a sequential 
joint selection for binding to both A and B. 

The materials A and B could be proteins that differ 
at only one or a few residues. For example, A could be 
a natural protein for which the gene has been cloned and 
B could be a mutant of A that retains the overall 3D 
structure of A. SBDs selected to bind A but not B 
probably bind to A near the residues that are mutated in 

B. If the mutations were picked to be in the active 
site of A (assuming A has an active site) , then an SBD 
that binds A but not B will bind to the active site of A 
and is likely to be an inhibitor of A. 

To obtain a protein that will bind to both A and B, 
we can, alternatively, first obtain an SBD that binds A 
and a different SBD that binds B. We can then combine 
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the genes encoding these domains so that a two- domain 
single-polypeptide protein is produced. The fusion 
protein will have affinity for both A and B because one 
of its domains binds A and the other binds B. 

One can also generate binding proteins with 
affinity for both A and B, such that these materials 
will compete for the same site on the binding protein. 
We guarantee competition by overlapping the sites for A 
and B. Using the procedures of the present invention, 
we first create a molecule that binds to target material 
A. We then vary a set of residues defined as: a) those 
residues that were varied to obtain binding to A, plus 
b) those residues close in 3D space to the residues of 
set (a) but that are internal and so are unlikely to 
bind directly to either A or B. Residues in set (b) are 
likely to make small changes in the positioning of the 
residues in set (a) such that the affinities for A and B 
will be changed by small amounts. Members of these 
populations are selected for affinity to both A and B. 
V.R. Selection for non-binding: 

The method of the present invention can be used to 
select proteins that do not bind to selected targets. 
Consider a protein of pharmacological importance, such 
as streptokinase, that is antigenic to an undesirable 
extent. We can take the pharmacologically important 
protein as IPBD and antibodies against it as target. 
Residues on the surface of the pharmacologically 
important protein would be variegated and GP(PBD)s that 
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do not bind to an antibody column would be collected and 
cultured. Surface residues may be identified in several 
ways, including: a) from a 3D structure, b) from 
hydrophobicity considerations, or c) chemical labeling. 
The 3D structure of the pharmacologically important 
protein remains the preferred guide to picking residues 
to vary, except now we pick residues that are widely 
spaced so that we leave as little as possible of the 
original surface unaltered. 

Destroying binding frequently requires only that a 
single amino acid in the binding interface be changed. 
If polyclonal antibodies are used, we face the problem 
that all or most of the strong epitopes must be altered 
in a single molecule. Preferably, one would have a set 
of monoclonal antibodies, or a narrow range of antibody 
species. If we had a series of monoclonal antibody 
columns, we could obtain one or more mutations that 
abolish binding to each monoclonal antibody. We could 
then combine some or all of these mutations in one 
molecule to produce a pharmacologi cally important 
protein recognized by none of the monoclonal antibodies. 
Such mutants are tested to verify that the 
pharmacologically interesting proper ties have not be 
altered to an unacceptable degree by the mutations. 

Typically, polyclonal antibodies display a range of 
binding constants for antigen. Even if we have only 
polyclonal antibodies that bind to the pharmacologically 
important protein, we may proceed as follows. We 



246 



engineer the pharmacologically important protein to 
appear on the surface of a replicable GP. We introduce 
mutations into residues that are on the surface of the 
pharmacologically important protein or into residues 
thought to be on the surface of the pharmacologically 
important protein so that a population of GPs is 
obtained. Polyclonal antibodies are attached to a 
column and the population of GPs is applied to the 
column at low salt. The column is eluted with a salt 
gradient . The GPs that elute at the lowest 

concentration of salt are those which bear 
pharmacologically important proteins that have been 
mutated in a way that eliminates binding to the 
antibodies having maximum affinity for the 
pharmacologically important protein. The GPs eluting at 
the lowest salt are isolated and cultured. The isolated 
SBD becomes the PPBD to further rounds of variegation so 
that the antigenic determinants are successively 
eliminated. 

V.S. Selection of PBDs for retention of structure: 

Let us take an SBD with known affinity for a target 
as PPBD to a variegation of a region of the PBD that is 
far from the residues that were varied to create the 
SBD. We can use the target as an affinity molecule to 
select the PBDs that retain binding for the target, and 
that presumably retain the underlying structure of the 
IPBD. The variegations in this case could include 
insertions and deletions that are likely to disrupt the 
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IPBD structure. We could also use the IPBD and 

AfM(IPBD) in the same way. 

For example, if IPBD were BPTI and AfM(BPTI) were 
trypsin, we could introduce four or five additional 
residue after residue 2 6 and select GPs that display 
PBDs having specific affinity for AfM(BPTI). Residue 26 
is chosen because it is in a turn and because it is 
about 25 A from K15, a' key amino acid in binding to 
trypsin. 

The underlying structure is most likely to be 
retained if insertions or deletions are made at loops or 
turns . 

V.T. Engineering of Antagonists 

It may be desirable to provide an antagonist to an 
enzyme or receptor. This may be achieved by making a 
molecule that prevents the natural substrate or agonist 
from reaching the active site. Molecules that bind 
directly to the active site may be either agonists or 
antagonists. Thus we adopt the following strategy. We 
consider enzymes and receptors together under the 
designation TER (Target Enzyme or Receptor) . 

For most TERs, there exist chemical inhibitors that 
block the active site. Usually, these chemicals are 
useful only as research tools due to highly toxicity. 
We make two affinity matrices: one with active TER and 
one with blocked TER. We make a variegated population 
of GP(PBD)s and select for SBPs that bind to both forms 
of the enzyme, thereby obtaining SDPs that do not bind 
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to the active site. We expect that SBDs will be found 
that bind different places on the enzyme surface. Pairs 
of the sbd genes are fused with an intervening peptide 
segment. For example, if SBD-1 and SBD- 2 are binding 
domains that show high affinity for the target enzyme 
and for which the binding is non-competitive, then the 
gene sbd- 1: : linker: :sbd-2 encodes a two-domain protein 
that will show high affinity for the target. We make 
several fusions having a variety of SBDs and various 
linkers. Such compounds have a reasonable probability 
of being an antagonist to the target enzyme. 
VI. EXPLOITATION OF SUCCESSFUL BINDING DOMAINS AND 
CORRESPONDING DNAS 

VI .A. Generally 

Using the method of the present invention, we can 
obtain a replicable genetic package that displays a 
novel protein domain having high affinity and specifi 
city for a target material of interest . Such a package 
carries both amino-acid embodiments of the binding 
protein domain and a DNA embodiment of the gene encoding 
the novel binding domain. The presence of the DNA 
facilitates expression of a protein comprising the novel 
binding protein domain within a high-level expression 
system, which need not be the same system used during 
the developmental process. 

VI . B . Production of Novel Binding Proteins 

We can proceed to production of the novel binding 
protein in several ways, including: a) altering of the 
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gene encoding the binding domain so that the binding 
domain is expressed as a soluble protein, not attached 
to a genetic package (either by deleting codons 5 1 of 
those encoding the binding domain or by inserting stop 
codons 3 ' of those encoding the binding domain) , b) 
moving the DNA encoding the binding domain into a known 
expression system, and c) utilizing the genetic package 
as a purification system. (If the domain is small 
enough, it may be feasible to prepare it by conventional 
peptide synthesis methods.) 

Option (c) may be illustrated as follows. Assume 
that a novel BPTI derivative has been obtained by 
selection of M13 derivatives in which a population of 
BPTI -derived domains are displayed as fusions to mature 
coat protein. Assume that a specific protease cleavage 
site ( e.g. that of activated clotting factor X) is 
engineered into the amino-acid sequence between the 
carboxy terminus of the BPTI -derived domain and the 
mature coat domain. Furthermore, we alter the display 
system to maximize the number of fusion proteins 
displayed on 4 each phage. The desired phage can be 
produced and purified, for example by centrif ugation, so 
that no bacterial products remain. Treatment of the 
purified phage with a catalytic amount of factor X 
cleaves the binding domains from the phage particles. A 
second centrif ugation step separates the cleaved protein 
from the phage, leaving a very pure protein preparation. 
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VI . C . Mini -Protein Production 

As previously mentioned, an advantage inhering from 
the use of a mini -protein as an IPBD is that it is 
likely that the derived SBD will also behave like a 
mini -protein and will be obtainable by means of chemical 
synthesis. (The term 11 chemical synthesis", as used 

herein, includes the use of enzymatic agents in a cell- 
free environment . ) 

It is also to be understood that mini-proteins 
obtained by the method of the present invention may be 
taken as lead compounds for a series of homologues that 
contain non-naturally occurring amino acids and groups 
other than amino acids. For example, one could 

synthesize a series of homologues in which each member 
of the series has one amino acid replaced by its D 
enantiomer. One could also make homologues containing 
constituents such as E alanine, aminobutyric acid, 3- 
hydroxyproline , 2 -Aminoadipic acid, N-ethylasperagine , 
norvaline, etc . ; these would be tested for -binding and 
other properties of interest, such as stability and 
toxicity. 

Peptides may be chemically synthesized either in 
solution or on supports. Various combinations of 

stepwise synthesis and fragment condensation may be 
employed . 

During synthesis, the amino acid side chains are 
protected to prevent branching. Several different 
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protective groups are useful for the protection of the 
thiol groups of cysteines : 

1) 4-methoxybenzyl (MBzl; Mob)(NISH82; ZAFA88) , 
removable with HF; 

2) acetamidomethyl (Acm) (NISH82 ; NISH86; BECK89c) , 
removable with iodine; mercury ions ( e.g. , mercuric 
acetate) ; silver nitrate; and 

3) S-para-methoxybenzyl (HOUG84) . 

Other thiol protective groups may be found in 
standard reference works such as Greene, PROTECTIVE 
GROUPS IN ORGANIC SYNTHESIS (1981) . 

Once the polypeptide chain has been synthesized, 
disulfide bonds must be formed. Possible oxidizing 
agents include air (HOUG84; NISH86) , ferricyanide 
(NISH82; HOUG84), iodine (NISH82) , and performic acid 
(HOUG84) . Temperature, pH, solvent, and chaotropic 
chemicals may affect the course of the oxidation. 

A large number of mini -proteins with a plurality of 
disulfide bonds have been chemically synthesized in 
biologically active form: conotoxin Gl (13AA, 4 Cys) 
(NISH82) ; heat-stable enterotoxin ST (18AA, 6 Cys) 
(HOUG84) ; analogues of ST (BHAT8 6) ; Q- conotoxin GVIA 
(27AA, 6Cys) (NISH86; RIVI87b) ; fl-conotoxin MVIIA (27 
AA, 6 Cys) (OLIV87b) ; a-conotoxin SI (13 AA, 4 Cys) 
(ZAFA88) ; ^-conotoxin Ilia (22AA, 6 Cys) (BECK89c, 
CRUZ 8 9 , HATA90) . Sometimes, the polypeptide naturally 
folds so that the correct disulfide bonds are formed. 
Other times, it must be helped along by use of a 
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differently removable protective group for each pair of 
cysteines . 

VI . D . Uses of Novel Binding Proteins 

The successful binding domains of the present 
invention may, alone or as part of a larger protein, be 
used for any purpose for which binding proteins are 
suited, including isolation or detection of target 
materials. In furtherance of this purpose, the novel 
binding proteins may be coupled directly or indirectly, 
covalently or noncovalently , to a label, carrier or 
support . 

When used as a pharmaceutical, the novel binding 
proteins may be contained with suitable carriers or 
adjuvanants . 

***** 

All references cited anywhere in this specification 
are incorporated by reference to the extent which they 
may be pertinent. 
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EXAMPLE I 

DISPLAY OF BPTI AS A FUSION TO M13 GENE VIII PROTEIN: 

Example I involves display of BPTI on M13 as a 
fusion to the mature gene VIII coat protein. Each of 
the DNA constructions was confirmed by restriction 
digestion analysis and DNA sequencing. 
1. Construction of the viii-signal- 

sequence : :bpti: : mature -viii - coat -protein Display Vector. 
A. Operative cloning vectors (OCV) . 

The operative cloning vectors are M13 and phage 
mids derived from M13 or fl. The initial construction 
was in the fl-based phagemid pGEM-3Zf ( - ) (TM) (Promega 
Corp. , Madison, WI . ) . 

A gene comprising, in order, : i) a modified lacUVS 
promoter, ii) a Shine-Dalgarno sequence, iii) DNA 
encoding the M13 gene VIII signal sequence, iv) a 
sequence encoding mature BPTI, v) a sequence encoding 
the mature-M13 -gene- VIII coat protein, vi) multiple stop 
codons, and vii) a transcription terminator, was 
constructed. This gene is illustrated in Tables 101- 
105; each table shows the same DNA sequence with 
different features annotated. There are a number of 
differences between this gene and the one proposed in 
the hypothetical example in the generic specification of 
the parent application. Because the actual construction 
was made in pGEM-3Zf(-), the ends of the synthetic DNA 
were made compatible with Sai l and Bam HI . The lacO 
operator of lacUVS was changed to the symmet rical lacO 
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with the intention of achieving tighter repression in 
the absence of IPTG. Several silent codon changes were 
made so that the longest segment that is identical to 
wild-type gene VIII is minimized so that genetic 
recombination with the co-existing gene VIII is 
unlikely . 

i) OCV based upon pGEM-3Zf . 

pGEM-3Zf (TM) (Promega Corp., Madison, WI . ) is a 
plasmid-based vector containing the amp gene, bacterial 
origin of replication, bacteriophage fl origin of 
replication, a lacZ operon containing a multiple cloning 
site sequence, and the T7 and SP6 polymerase binding 
sequences . 

Two restriction enzyme recognition sites were 
introduced, by site-directed oligonucleotide 

mutagenesis, at the boundaries of the lacZ operon. This 
allowed for the removal of the lacZ operon and its 
replacement with the synthetic gene. A BamHI 

recognition site (GGATCC) was introduced at the 5' end 
of the lacZ operon by the mutation of bases C 33 i and T 332 
to G and A respectively (numbering of Promega) . A Sai l 
recognition site (GTCGAC) was introduced at the 3 ■ end 
of the operon by the mutation of bases C 302 i and T 3023 to G 
and C respectively. A construct combining these 

variants of pGEM-3Zf was designated pGEM-MB3 / 4 . 

ii) OCV based upon M13mpl8. 

M13mpl8 (YANI85) is an M13 bacteriophage-based 
vector (available from, inter alia , New England Biolabs, 
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Beverly, MA.) consisting of the whole of the phage 
genome into which has been inserted a lacZ operon 
containing a multiple cloning site sequence (MESS77) . 
Two restriction enzyme sites were introduced into 
M13mpl8 using standard methods. A BamHI recognition 
site (GGATCC) was introduced at the 5 * end of the lacZ 
operon by the mutation of bases C 60 o3 and G 60 04 to A and T 
respectively (numbering of Messing) . This mutation also 
destroyed a unique Nar l site. A Sai l recognition site 
(GTCGAC) was introduced at the 3 ' end of the operon by 
the mutation of bases A 643 o and C 6 432 to C and A 
respectively. A construct combining these variants of 
M13mpl8 was designated M13-MB1/2. 
B) Synthetic Gene. 

A synthetic gene ( VI II -signal -sequence : : mature - 
bpti : : mature -VI I I -coat -protein ) was constructed from 16 
synthetic oligonucleotides (Table 105) , custom 
synthesized by Genetic Designs Inc. of Houston, Texas, 
using methods detailed in KIMH89 and ASHM89. Table 101 
shows the DNA sequence; Table 102 contains an annotated 
version of this sequence. Table 103 shows the overlaps 
of the synthetic oligonucleotides in relationship to the 
restriction sites and coding sequence. Table 104 shows 
the synthetic DNA in double -stranded form. Table 105 
shows each of the 16 synthetic oligonucleotides from 5'- 
to-3'. The oligonucleotides were phosphorylated, with 
the exception of the 5 1 most molecules, using standard 
methods, annealed and ligated in stages such that a 
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final synthetic duplex was generated. The overhanging 
ends of this duplex was filled in with T4 DNA polymerase 
and it was cloned into the Hin di site of pGEM-3Zf ( - ) ; 
the initial construct is called pGEM-MBl (Table 101a) . 
Double -stranded DNA of pGEM-MBl was cut with Pst I , 
filled in with T4 DNA polymerase and ligated to a Sai l 
linker (New England BioLabs) so that the synthetic gene 
is bounded by BamHI and Sai l sites (Table 101b and Table 
102b) . The synthetic gene was obtained on a BamHI -Sail 
cassette and cloned into pGEM-MB3/4 and M13-MB1/2 
utilizing the BamH I and Sai l sites previously 
introduced, to generate the constructs designated pGEM- 
MBl 6 and M13-MB15, respectively. The full length of the 
synthetic insert was sequenced and found to be 
unambiguously correct except for: 1) a missing G in the 
Shine -Dalgarno sequence; and 2) a few silent errors in 
the third bases of some codons (shown as upper case in 
Table 101) . Table 102 shows the Ribosome-binding site 
A104GGAGG but the actual sequence is A 104 GAGG. Efforts to 
express protein from this construction, in vivo and in 
vitro , were unavailing. 

C) Alterations to the synthetic gene, 
i) Ribosome binding site (RBS) * 

Starting with the construct pGEM-MB16 , a fragment 
of DNA bounded by the restriction enzyme sites SacI and 
Nhel (containing the original RBS) was replaced with a 
synthetic oligonucleotide duplex (with compatible SacI 
and Nhe l overhangs) containing the sequence for a new 
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RBS that is very similar to the RBS of coli phoA and 

that has been shown to be functional . 

Original putative RBS (5 1 -to-3') C^S3 *t> AJQM3"Q 

GAGCTCagaggCTTACTATGAAGAAATCTCTGGTTCTTAAGGCTAGC 
I SacI 1 | Nhe I [ 

New RBS (5' -to-3 1 ) \V Vt>:\yg) 

GAGCTCTggaggaAATAAAATGAAGAAATCTCTGGTTCTTAAGGCTAGC 
| SacI 1 1 Nhe I | 

The putative RBSs above are lower case and the 
initiating methionine codon is underscored and bold. 
The resulting construct was designated pGEM-MB20. In 
vitro expression of the gene carried by pGEM-MB20 
produced a novel protein species of the expected size, 
about 14.5 kd. 
ii) tac promoter. 

In order to obtain higher expression levels of the 
fusion protein, the lacUVS promoter was changed to a tac 
promoter. Starting with the construct pGEM-MB16, which 
contains the lacUV5 promoter, a fragment of DNA bounded 
by the restriction enzyme sites BamHI and Hpa ll was 
excised and replaced with a compatible synthetic 
oligonucleotide duplex containing the -35 sequence of 
the trp promoter, Cf RUSS82 . This converted the lacUVS 
promoter to a tac promoter in a construct designated 
pGEM-MB22 , Table 112. 

MB16 ^^t^oiiHi 

5'- GATCC tctagagtcggc TTTACA ctttatgcttc (cg-gctcg . . -3 1 
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3'- G agatctcagccg aaatgt gaaatacgaag gc (cgagc . . -5 ' 
J L I -35| J L 

Bam HI Hpa ll 
MB22 insert LS^Q ID jOO* <M Q 

5*- GATCC actccccatccccctg TTGACA attaatcat -3' 

3'- G tgaggggtagggggac AACTGT taattagtagc-5 1 

J L ' I -35] 1 

BamH I ( Hpa ll) 



Promoter and RBS variants of the fusion protein 

gene were constructed by basic DNA manipulation 

techniques to generate the following: 

Promoter RBS Encoded Protein. 

pGEM-MB16 lac old VIIIs . p . -BPTI -matureVIII 

pGEM-MB2 0 lac new 1 ' 

pGEM-MB22 tac old 

pGEM -MB2 6 tac new 
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The synthetic gene from variants pGEM-MB2 0 and 
pGEM-MB2 6 were recloned into the altered phage vector 
M13-MB1/2 to generate the phage constructs designated 
M13-MB27 and M13-MB28 respectively, 
iii. Signal Peptide Sequence. 

In vitro expression of the synthetic gene regulated 
by tac and the "new" RBS produced a novel protein of the 
expected size for the unprocessed protein (about 16 kd) . 
In vivo expression also produced novel protein of full 
size ; no processed protein could be seen on phage or in 
cell extracts by silver staining or by Western analysis 
with ant i -BPTI antibody. 
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Thus we analyzed the signal sequence of the fusion. 
Table 106 shows a number of typical signal sequences. 
Charged residues are generally thought to be of great 
importance and are shown bold and underscored . Each 
signal sequence contains a long stretch of uncharged 
residues that are mostly hydrophobic; these are shown in 
lower case. At the right, in parentheses, is the length 
of the stretch of uncharged residues. We note that the 
fusions of gene VIII signal to BPTI and gene III signal 
to BPTI have rather short uncharged segments. These 
short uncharged segments may reduce or prevent 
processing of the fusion peptides. We know that the 
gene III signal sequence is capable of directing: a) 
insertion of the peptide comprising (mature- 
BPTI) : : (mature -gene - 1 1 1 -protein) into the lipid bilayer, 
and b) translocation of BPTI and most of the mature gene 
III protein across the lipid bilayer ( vide infra ) . That 
the gene III remains anchored in the lipid bilayer until 
the phage is assembled is directed by the uncharged 
anchor region near the carboxy terminus of the mature 
gene III protein (see Table 116) and not by the 
secretion signal sequence. The phoA signal sequence can 
direct secretion of mature BPTI into the periplasm of E . 
coli 1 (MARK8 6) . Furthermore, there is controversy over 
the mechanism by which mature authentic gene VIII 
protein comes to be in the lipid bilayer prior to phage 
assembly. 
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Thus we decided to replace the DNA coding on 
expression for the gene-VIII -putative-signal -sequence by- 
each of: 1) DNA coding on expression for the phoA signal 
sequence, 2) DNA coding on expression for the bla signal 
sequence, or 3) DNA coding on expression for the M13 
gene III signal. Each of these replacements produces a 
tripartite gene encoding a fusion protein that 
comprises, in order: (a) a signal peptide that directs 
secretion into the periplasm of parts (b) and (c) , 
derived from a first gene; (b) an initial potential 
binding domain (BPTI in this case) , derived from a 
second gene (in this case, the second gene is an animal 
gene) ; and (c) a structural packaging signal (the mature 
gene VIII coat protein), derived from a third gene. 

The process by which the IPBD :: packaging-signal 
fusion arrives on the phage surface is illustrated in 
Figure 1. In Figure la, we see that authentic gene VIII 
protein appears (by whatever process) in the lipid 
bilayer so that both the amino and carboxy termini are 
in the cytoplasm. Signal peptidase-I cleaves the gene 
VIII protein liberating the signal peptide (that is 
absorbed by the cell) and mature gene VIII coat protein 
that spans the lipid bilayer. Many copies of mature 
gene VIII coat protein accumulate in the lipid bilayer 
awaiting phage assembly (Figure lc) . Some signal 

sequences are able to direct the translocation of quite 
large proteins across the lipid bilayer. If additional 
codons are inserted after the codons that encode the 
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cleavage site of the signal peptidase- I of such a potent 
signal sequence, the encoded amino acids will be 
translocated across the lipid bilayer as shown in Figure 
lb. After cleavage by signal peptidase- I, the amino 
acids encoded by the added codons will be in the 
periplasm but anchored to the lipid bilayer by the 
mature gene VIII coat protein, Figure Id. The circular 
single-stranded phage DNA is extruded through a part of 
the lipid bilayer containing a high concentration of 
mature gene VIII coat protein; the carboxy terminus of 
each coat protein molecule packs near the DNA while the 
amino terminus packs on the outside. Because the fusion 
protein is identical to mature gene VIII coat protein 
within the trans-bilayer domain, the fusion protein will 
co-assemble with authentic mature gene VIII coat protein 
as shown in Figure le . 

In each case, the mature VIII coat protein moiety 
is intended to co-assemble with authentic mature VIII 
coat protein to produce phage particle having BPTI 
domains displayed on the surface. The source and 
character of the secretion signal sequence is not 
important because the signal sequence is cut away and 
degraded. The structural packaging signal, however, is 
quite important because it must co-assemble with the 
authentic coat protein to make a working virus sheath. 
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a) Bacterial Alkaline Phosphatase ( phoA ) Signal 
Peptide . 

Construct pGEM-MB2 6 contains a fragment of DNA 

bounded by restriction enzyme sites SacI and AccIII 

which contains the new RBS and sequences encoding the 

initiating methionine and the signal peptide of M13 gene 

VIII pro-protein. This fragment was replaced with a 

synthetic duplex (constructed from four annealed 

oligonucleotides) containing the RBS and DNA coding for 

the initiating methionine and signal peptide of PhoA 

(INOU82) . The resulting construct was designated pGEM- 

MB42; the sequence of the fusion gene is shown in Table 

113. M13MB48 is a derivative of GemMB4 2 . A BamHI-Sall 

DNA fragment from GenMB42 , containing the gene 

construct, was ligated into a similarly cleaved vector 

M13MB1/2 giving rise to M13MB48.-X 

PhoA RBS A and signal peptide sequence tS€fr \0 K)o:tt/Mj 

5 ' - GAGCTCCATGGGAGAAAATAAA . ATG . AAA . CAA . AGC . ACG . - 
1 SacI | met lys gin ser thr 

. ATC . GCA . CTC . TTA . CCG . TTA . CTG . TTT . ACC . CCT . GTG . ACA . - 
ile ala leu leu pro leu leu phe thr pro val thr 

. AAA . GCC . CGT . CCG . GAT . - 3 ' 

lys ala arg pro asp 

| AccIII | 

b) beta- lactamase signal peptide. 

To enable the introduction of the beta- lactamase 
( amp ) promoter and DNA coding ■ for the signal peptide 
into the gene encoding (mature-BPTI ) : : (mature-VIII - 
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coat-protein) an initial manipulation of the amp gene 



pGEM-3Zf an AccIII recognition site (TCCGGA) was 
introduced into the amp gene adjacent to the DNA 
sequence encoding the amino acids at the beta-lactamase 
signal peptide cleavage site. Using standard methods of 
in vitro site-directed oligonucleotide mutagenesis bases 
C2504 and A 2 soi were converted to T and G respectively to 
generate the construct designated pGEM-MB40 . Further 
manipulation of pGEM-MB40 entailed the insertion of a 
synthetic oligonucleotide linker (CGGATCCG) containing 
the BamHI recognition sequence (GGATCC) into the Aatll 
site (GACGTC starting at nucleotide number 2260) to 
generate the construct designated pGEM-MB45. The DNA 
bounded by the restriction enzyme sites of BamH I and 
Acc III contains the amp promoter, amp RBS , initiating 
methionine and beta-lactamase signal peptide. .This 
fragment was used to replace the corresponding fragment 
from pGEM-MB2 6 to generate construct pGEM-MB4 6. 



5 1 -GGATCCGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTT- 

TATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACC- 

CTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGT - 

ATG . AGT . ATT . CAA . CAT . TTC . CGT . GTC . GCC . CTT . ATT . - 
met ser -ile gin his phe arg val ala leu ile 



(encoding beta-lactamase) was required. 



Starting with 
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CCC . TTT . TTT . GCG . GCA . TTT . TGC . CTT . CCT . GTT . TTT . - 
pro phe phe ala ala phe cys leu pro val phe 

GCT.CAT.CCG. -3 1 
ala his pro .... 

c) M13 -gene-III- signal : :bpti: :mature-VIII-coat-protein 

We may also construct, as depicted in Figure 5, 
M13-MB51 which would carry a gene encoding a fusion of 
M13-gene-III-signal-peptide to the previously described 
BPTI : :mature VIII coat protein. First the BstEII site 
that follows the stop codons of the synthetic gene VIII 
is changed to an AlwNI site as follows. DNA of pGEM- 
MB26 is cut with Bst EII and the ends filled in by use of 
Klenow enzyme; a blunt AlwN I linker is ligated to this 
DNA. This construction is called pGEM-MB2 6Alw . The 
Xhol to AlwN I fragment (approximately 300 bp) of pGEM- 
MB2 6Alw is purified. RF DNA from phage MK-BPTI ( vide 
infra ) is cut with Alw NI and Xho l and the large fragment 
purified. These two fragments are ligated together; the 
resulting construction is named M13-MB51. Because M13- 
MB51 contains no gene ■ III , ■ the phage can not form 
plaques. M13-MB51 can, however, render cells Km R . 
Infectious phage particles can be obtained by use of 
helper phage. As explained below, the gene III signal 
sequence is capable of directing (BPTI) : : (mature-gene- 
III-protein) to the surface of phage. In M13-MB51, we 
have inserted DNA encoding gene VIII coat protein (50 
amino acids) and three stop codons 5 ' to the DNA 
encoding the mature gene III protein. 
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Summary of signal peptide fusion protein variants. 
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protein) Genes 

i) In vitro analysis 



A coupled transcription/translation prokaryotic 
system (Amersham Corp., Arlington Heights, IL) was 
utilized for the in vitro analysis of the protein 
products encoded by the BPTI/VIII synthetic gene and the 
variants derived from this. 

Table 107 lists the protein products encoded by the 
listed vectors which are visualized by the standard 
method of fluorography following in vitro synthesis in 
the presence of 35 S -methionine and separation of the 
products using SDS polyacrylamide gel electrophoresis. 
In each sample a pre-beta-lactamase product 
(approximately 31 kd) can be seen. This is derived from 
t ^ rie am P gene which is the common selection gene for each 
of the vectors. In addition, a (pre-BPTI/VIII) product 
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encoded by the synthetic gene and variants can be seen 
as indicated. The migration of these species 

(approximately 14.5 kd) is consistent with the expected 
size of the encoded proteins, 
ii) In vivo analysis. 

The vectors detailed in sections (B) and (C) were 
freshly transfected into the coli strain XLl-blue (TM) 
(Stratagene, La Jolla, CA) and in strain SEF 1 . E^ coli 
strain SE6004 (LISS85) carries the prlA4 mutation and is 
more permissive in secretion. than strains that carry the 
wild-type prlA allele. SE6004 is F" and is deleted for 
lacl; thus the cells can not be infected by M13 and 
lacUVS and tac promoters can not be regulated with IPTG. 
Strain SEF ' is derived from strain SE6004 (L1SS85) by 
crossing with XLl-Blue (TM) ; the F ! in XLl-Blue (TM) carries 
Tc R and lacl q . SE6004 is streptomycin 11 , Tc s while XL1- 
Blue (TM) is streptomycin 55 , Tc R so that both parental 
strains can be killed with the combination of Tc and 
streptomycin. SEF' retains the secretion-permissive 
phenotype of the parental strain, SE6004 (prlA4) . 

The fresh transf ectants were grown in NZYCM medium 
(SAMB8 9) for 1 hour after which IPTG was added over the 
range of concentrations 1.0 fiM to 0.5 mM (to derepress 
the lacUVS and tac promoters) and grown for an 
additional 1.5 hours. 

Aliquots of the bacterial cells expressing the 
synthetic insert encoded proteins together with the 
appropriate controls (no vector, vector with no insert 
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and zero IPTG) were lysed in SDS gel loading buffer and 
electrophoresed in 20% polyacrylamide gels containing 
SDS and urea. Duplicate gels were either silver stained 
(Daiichi, Tokyo, Japan) or electrotransf erred to a nylon 
matrix (Immobilon from Millipore, Bedford, MA) for 
western analysis by standard means using rabbit 
anti-BPTI polyclonal antibodies. 

Table 108 lists the interesting proteins visualized 
on a silver stained gel and by western analysis of an 
identical gel. We can see clearly in the western 
analysis that protein species containing BPTI epitopes 
are present in the test strains which are absent from 
the control strains and which are also IPTG inducible. 
In XLl-Blue (TM) , the migration of this species is 
predominantly that of the unprocessed form of the 
pro-protein although a small proportion of the encoded 
proteins appear to migrate at a size consistent with 
that of a fully processed form. In SEF 1 , the processed 
form predominates, there being only a faint band 
corresponding to the unprocessed species. 

Thus in strain SEF 1 , we have produced a tripartite 
fusion protein that is specifically cleaved after the 
secretion signal sequence. We believe that the mature 
protein comprises BPTI followed by the gene VIII coat 
protein and that the coat protein moiety spans the 
membrane . We believe that it is highly likely that one 
or more copies, perhaps hundreds of copies, of this 
protein will co- assemble into Ml 3 derived phage or Ml 3 - 
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like phagemids. This construction will allow us to a) 
mutagenize the BPTI domain, b) display each of the 
variants on the coat of one or more phage (one type per 
phage) , and c) recover those phage that display variants 
having novel binding properties with respect to target 
materials of our choice. 

Rasched and Oberer (RASC8 6) report that phage 
produced in cells that express two alleles of gene VIII , 
that have differences within the first 11 residues of 
the mature coat protein, contain some of each protein. 
Thus, because we have achieved in vivo processing of the 
phoA( signal) : : bpti : : matureVIII fusion gene, it is highly 
likely that co-expression of this gene with wild-type 
VIII will lead to production of phage bearing BPTI 
domains on their surface. Mutagenesis of the bpti 
domain of these genes will provide a population of 
phage, each phage carrying a gene that codes for the 
variant of BPTI displayed on the phage surface. 
VIII Display Phage: Production, Preparation and 
Analysis . 

i. Phage Production* 

The OCV can be grown in XLl-Blue (TM> in the absence 
of the inducing agent , IPTG. Typically, a plaque plug 
is taken from a plate and grown in 2 ml of medium, 
containing freshly diluted bacterial cells, for 6 to 8 
hours. Following centrif ugation of this culture the 
supernatant is taken and the phage titer determined. 
This is kept as a phage stock for further infection, 
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phage production and display of the gene product of 
interest . 

A 100 fold dilution of a fresh overnight culture of 
SEF 1 bacterial cells in 500 ml of NZCYM medium is 
allowed to grow to a cell density of 0.4 (Ab 600nm) in a 
shakier incubator at 37 °C. To this culture is added a 
sufficient amount of the phage stock to give a MOI of 10 
together with IPTG to give a final concentration of 0.5 
mM. The culture is allowed to grow for a further 2 hrs . 
ii. Phage Preparation and Purification. 

The phage producing bacterial culture is 
centrifuged to separate the phage in the supernatant 
from the bacterial pellet. To the supernatant is added 
one quarter by volume of phage precipitation solution 
(20% PEG, 3.75 M ammonium acetate) and PMSF to a final 
concentration of ImM. It is left on ice for 2 hours 
after which the precipitated phage is retrieved by 
centrif ugation. The phage pellet is redissolved in 
TrisEDTA containing 0.1% Sarkosyl and left at 4°C for 1 
hour after which any bacteria and bacterial debris is 
removed by centrif ugation . The phage in the supernatant 
is reprecipitated with PEG overnight at 4°C. The phage 
pellet is resuspended in LB medium and repreciptated 
another two times to remove the detergent. The phage is 
stored in LB medium at 4°C / titered and used for 
analysis and binding studies. 

A more stringent phage purification scheme involves 
centrif ugation in a CsCl gradient. 3.86 g of CsCl is 
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dissolved in NET buffer (0.1 M NaCl , ImM EDTA, 0 . 1M Tris 
pH 7.7) upto a volume of 10 ml. 10 12 to 10 13 phage in TE 
Sarkosyl buffer a re mixed with 5 ml of CsCl NET buffer 
and transferred to a sealable ultracentrif uge tube. 
Centrif ugation is performed overnight at 34K rpm in a 
Sorvall OTD-65B Ultracentrif uge . The tubes are opened 
and 400 ^1 aliqouts are carefully removed. 5 ixl 
aliqouts are removed from the fractions and analysed by 
agarose gel electrophoresis after heating at 65 °C for 15 
minutes together with the gel loading buffer containing 
0.1% SDS . Fractions containing phage are pooled, the 
phage reprecipitated and finally redissolved in LB 
medium to a concentration of 10 12 to 10 13 phage per ml. 
iii. Phage Analysis. 

The display phage, together with appropriate 
controls are analyzed using standard methods of 
polyacrylamide gel electrophoresis and either silver 
staining of the gel or electrotransf er to a nylon matrix 
followed by analysis with anti-BPTI antiserum (Western 
analysis) . Quantitation of the display of heterologous 
proteins is achieved by running a serial dilution of the 
starting protein, for example BPTI , together with the 
display phage samples in the electrophoresis and Western 
analyses described above. An alternative method 

involves running a 2 fold serial dilution of a phage in 
which both the major coat protein and the fusion protein 
are visualized by silver staining. A comparison of the 
relative ratios of the two protein species allows one to 
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estimate the number of fusion proteins per phage since 
the number of VIII gene encoded proteins per phage 
( approxima t e ly 3000) is known . 

Incorporation of fusion protein into bacteriophage. 

In vivo expression of the processed BPTI: VIII 
fusion protein, encoded by vectors GemMB42 (above and 
Table 113) and M13MB48 (above) , implied that the 
processed fusion product was likely to be correctly 
located within the bacterial cell membrane. This 
localization made it possible that it could be 
incorporated into the phage and that the BPTI moiety 
would be displayed at the bacteriophage surface. 

SEF ' cells were infected with either M13MB48 
(consisting of the starting phage vector M13mpl8, 
altered as described above, containing the synthetic 
gene consisting of a tac promoter, functional ribosome 
binding site, phoA signal peptide, mature BPTI and 
mature major coat protein) or M13mpl8, as a control. 
Phage infections, preparation and purification was 
performed as described in Example VIII. 

The resulting phage were electrophoresed 
(approximately 10 11 phage per lane) in a 20% 
polyacrylamide gel containing urea followed by 
electrotransf er to a nylon matrix and western analysis 
using anti-BPTI rabbit serum. A single species of 
protein was observed in phage derived from infection 
with the M13MB48 stock phage which was not observed in 
the control infection. This protein had a migration of 



272 



about 12 kd, consistent with that of the fully processed 
fusion protein. 

Western analysis of SEF ' bacterial lysate with or 
without phage infection demonstrated another species of 
protein of about 20kd. This species was also present, 
to a lesser degree, in phage preparations which were 
simply PEG precipitated without further purification 
(for example, using nonionic detergent or by CsCl 
gradient centrif ugation) . A comparison of M13MB4 8 phage 
progof f 

eparations made in the presence or absence of detergent 
aldemonstrated that sarkosyl treatment and CsCl gradient 
purification did remove the bacterial contaminant while 
having no effect on the presence of the BPTI:VIII fusion 
protein. This indicates that the fusion protein has 
been incorporated and is a constituent of the phage 
body. 

The time course of phage production and BPT1:VIII 
incorporation was followed post-infection and after IPTG 
induction. Phage production and fusion protein 

incorporation appeared to be maximal after two hours. 
This time course was utilized in further phage 
productions and analyses. 

Polyacrylamide electrophoresis of the phage 
preparations, followed by silver staining, demonstrated 
that the preparations were essentially free of 
contaminating protein species and that an extra protein 
band was present in M13MB4 8 derived phage which was not 
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M K K S -rest of VIII 
ACT . TCC . TC . ATG . AAA . AAG . TCT . (SEQ ID NOs:96 

and 97) 

rest of XI - T S S stop 

(The amino acid sequence MKKS has SEQ ID NO: 9) 
Site-specific mutagenesis . 

(L) K K S -rest of VIII 
ACT. TCC. AG. CTG. AAA. AAG. TCT. (SEQ ID 

NOs: 98 and 99) 

rest of XI - T S S stop 

Note that the 3 ' end of the XI gene overlaps with 
the 5 ! end of the VIII gene. Changes in DNA sequence 
were designed such that the desired change in the VIII 
gene product could be achieved without alterations to 
the predicted amino acid sequence of the gene XI 
product. A diagnostic Pvu II recognition site was 

introduced at this site. 

It was anticipated that initiation of the natural 
gene VIII product would be hindered, enabling a higher 
proportion of the fusion protein to be incorporated into 
the resulting phage. 

Analyses of the phage derived from this modified 
vector indicated that there was a significant increase 
in the ratio of fusion protein to major coat protein. 
Quantitative estimates indicated that within a phage 
population as much as 100 copies of the BPTI:VIII fusion 
were incorporated per phage . 
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Incorporation of interdomain extension fusion proteins 
into phage* 

A phage pool containing a variegated pentapeptide 
extension at the BPTI:coat protein interface (see 
Example VII) was used to infect SEF ' cells. IPTG 
induction, phage production and preparation were as 
described in Example VIII. Using the criteria detailed 
in the previous section, it was determined that extended 
fusion proteins were incorporated into phage. Gel 
-electrophoresis of the generated phage, followed by 
either silver staining or western analysis with anti- 
BPTI rabbit serum, demonstrated fusion proteins that 
migrated similarly to but discernably slower that of the 
starting fusion protein. 

With regard to the ' EGGGS linker 1 (SEQ ID NO: 10) 
extensions of the domain interface, individual phage 
stocks predicted to contain one or more 5-amino-acid 
unit extensions were analyzed in a similar fashion. The 
migration of the extended fusion proteins were readily 
distinguishable from the parent fusion protein when 
viewed by western analysis or silver staining. Those 
clones analyzed in more detail included M13.3X4 (which 
contains a single inverted EGGGS (SEQ ID NO: 10) linker 
with a predicted amino acid sequence of GSSSL (SEQ ID 
NO: 16)), M13.3X7 (which contains a correctly orientated 
linker with a predicted amino acid sequence of EGGGS 
(SEQ ID NO:10)), Ml 3 . 3X11 (which contains 3 linkers with 
an inversion and a predicted amino acid sequence for the 
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extension of EGGGSGSSSLGSSSL (SEQ ID N0:11)) and M13.3Xd 
which contains an extension consisting of at least 5 
linkers or 25 amino acids. 

The extended fusion proteins were all incorporated 
into phage at high levels (on average 10' s of copies per 
phage were present and when analyzed by gel 
electrophoresis migrated rates consistent with the 
predicted size of the extension. Clones M13 . 3X4 and 
M13.3X7 migrated at a position very similar to but 
discernably different from the parent fusion protein, 
while M13.3X11 and M13.3Xd were markedly larger. 
Display of BPTI: VIII fusion protein by bacteriophage. 

The BPTI:VIII fusion protein had been shown to be 
incorporated into the body of the phage. This phage was 
analyzed further to demonstrate that the BPTI moiety was 
accessible to specific antibodies and hence displayed at 
the phage surface . 

The assay is detailed in Example II, but 
principally involves the addition of purified anti-BPTI 
IgG (from the serum of BPTI injected rabbits) to a . known 
titer of phage. Following incubation, protein A-agarose 
beads are added to bind the IgG and left to incubate 
overnight. The IgG-protein A beads and any bound phage 
are removed by centrif ugation followed by a retitering 
of the supernatant to determine any loss of phage. The 
phage bound to the beads can be acid eluted and titered 
also. Appropriate controls are included in the assay, 
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such as a wild type phage stock (M13mpl8) and IgG 
purified from normal rabbit pre-immune serum. 

Table 140 shows that while the titer of the wild 
type phage is unaltered by the presence of anti-BPTI 
IgG, BPTI-IIIMK (the positive control for the assay) , 
demonstrated a significant drop in titer with or without 
the extra addition of protein A beads. (Note that since 
the BPTI moiety is part of the III gene product which is 
involved in the binding of phage to bacterial pili, such 
a phenomenon is entirely expected.) Two batches of 
M13MB48 phage (containing the BPTI:VIII fusion protein) 
demonstrated a significant reduction in titer, as judged 
by plaque forming units, when anti-BPTI antibodies and 
protein A beads were added to the phage . The initial 
drop in titer with the antibody alone, differs somewhat 
between the two batches of phage. This may be a result 
of experimental or batch variation. Retrieval of the 
immunoprecipitated phage, while not quantitative, was 
significant when compared to the wild type phage 
control . 

Further control experiments relating to this 
section are shown in Table 141 and Table 142. The data 
demonstrated that the loss in titer observed for the 
BPTI:VIII containing phage is a result of the display of 
BPTI epitopes by these phage and the specific 
interaction with anti-BPTI antibodies. No significant 
interaction with either protein A agarose beads or IgG 
purified from normal rabbit serum could be demonstrated. 
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The larger drop in titer for M13MB48 batch five reflects 
the higher level incorporation of the fusion protein in 
this preparation. 

Functionality of the BPTI moiety in the BPTI-VIII 
display phage. 

The previous two sections demonstrated that the 
BPTI:VIII fusion protein has been incorporated into the 
phage body and that the BPTI moiety is displayed at the 
phage surface. To demonstrate that the displayed 

molecule is functional, binding experiments were 
performed in a manner almost identical to that described 
in the previous section except that proteases were used 
in place of antibodies. The display phage, together 
with appropriate controls, are allowed to interact with 
immobilized proteases or immobilized inactivated 
proteases. Binding can be assessed by monitoring the 
loss in titer of the display phage or by determining the 
number of phage bound to the respective beads. 

Table 143 shows the results of an experiment in 
which BPTI. VI II display phage, M13MB48, were allowed to 
bind to anhydrotrypsin- agarose beads. There was a 
significant drop in titer when compared to wild type 
phage, which do not display BPTI. A pool of phage (5AA 
Pool) , each contain a variegated 5 amino acid extension 
at the BPTI: major coat protein interface, demonstrated a 
similar decline in titer. In a control experiment 
(table 143) very little non-specific binding of the 
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above display phage was observed with agarose beads to 
which an unrelated protein (streptavidin) is attached. 

Actual binding of the display phage is demonstrated 
by the data shown for two experiments in Table 144 . The 
negative control is wild type M13mpl8 and the positive 
control is BPTI-IIIMK, a phage in which the BPTI moiety, 
attached to the gene III protein, has been shown to be 
displayed and functional. M13MB48 and M13MB56 both bind 
to anhydrotrypsin beads in a manner comparable to that 
of the positive control, being 40 to 60 times better 
than the negative control (non-display phage) . Hence 
functionality of the BPTI moiety, in the major coat 
fusion protein, was established. 

To take this analysis one step further, a 
comparison of phage binding to active and inactivated 
trypsin is shown in Table 145. The control phage, 
M13mpl8 and BPTI -III MK, demonstrated binding similar to 
that detailed in Example III. Note that the relative 
binding is enhanced with trypsin due to the apparent 
marked reduction in the non-specific binding of the wiid 
type phage to the active protease. M13.3X7 and 

M13.3X11, which both contain ' EGGGS 1 linker (SEQ ID 
NO: 10) extensions at the domain interface, bound to 
anhydrotrypsin and trypsin in a manner similar to BPTI- 
IIIMK phage. The binding, relative to non-display 
phage, was approximately 100 fold higher in the 
anhydrotrypsin binding assay and at least 1000 fold 
higher in the trypsin binding assay. The binding of 
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another ' EGGGS 1 linker variant (M13.3Xd) was similar to 
that of M13 .3X7. 

To demonstrate the specificity of binding the 
assays were repeated with human neutrophil elastase 
(HNE) beads and compared to that seen with trypsin beads 
Table 146. BPTI has a very high affinity for trypsin 
and a low affinity for HNE, hence the BPTI display phage 
should reflect these affinities when used in binding 
assays with these beads. The negative and positive 
controls for trypsin binding were as already described 
above while an additional positive control for the HNE 
beads, BPTI (K15L, MGNG) -III MA (see Example III) was 
included. The results, shown in Table 14 6, confirmed 
this prediction. M13MB48, M13.3X7 and M13.3X11 phage 
demonstrated good binding to trypsin, relative to wild 
type phage and the HNE control (BPTI (K15L, MGNG) -III MA) 
(The amino acid sequence MGNG has SEQ ID NO: 12; BPTI 

( , MGNG) denotes a homologue of BPTI having M 39/ 

G 40/ N 4 i, G 42 , where .... may indicate other 
alterations.), being comparable to BPTI- IIIMK phage. 
Conversely poor binding occurred when HNE beads were 
used, with the exception of the HNE positive control 
phage . 

Taken together the accumulated data demonstrated 
that when BPTI is part of a fusion protein with the 
major coat protein of M13 phage, the molecule is both 
displayed at the surface of the phage and a significant 
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proportion of it is functional in a specific protease 
binding manner. 

* * * 

EXAMPLE II 

CONSTRUCTION OF BPTI/GENE-III DISPLAY VECTOR 

DNA manipulations were conducted according to 
standard procedures as described in Maniatis et al . 
(MANI82) . First the unwanted lacZ gene of M13-MB1/2 was 
removed. M13-MB1/2 RF was cut with BamH I and Sai l and 
the large fragment was isolated by agarose gel 
electrophoresis. The recovered 6819 bp fragment was 
filled in with Klenow fragment of coli DNA polymerase 
and ligated to a synthetic Hindi I I 8mer linker 
(CAAGCTTG) . The ligation sample was used to transfect 
competent XLl-Blue (TM) (Stratagene, La Jolla, CA) cells 
which were subsequently plated for plaque formation. RF 
DNA was prepared from chosen plaques and a clone, M13- 
MB1/2 -delta, containing regenerated Bam HI and Sai l sites 
as well as a new Hin di 1 1 site, all 500 bp upstream of 
the Bgl ll site (6935) was picked. 

A unique Narl site was introduced into codons 17 
and 18 of gene III (changing the amino acids from H-S to 
G-A, Cf . Table 110) . 10 6 phage produced from bacterial 
cells harboring the M13 -MBl/2 -delta RF DNA were used to 
infect a culture of CJ236 cells (relevant genotype: F 1 , 
dutl , ungl , Cm R ) (OD595 = 0 . 35) . Following overnight 

incubation at 37 °C, phage were recovered and uracil - 
containing ss DNA was extracted from phage in accord 
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with the instructions for the MUTA-GENE M13 in vitro 
Mutagenesis Kit (Catalogue Number 170-3571, Bio-Rad, 
Richmond, CA) . Two hundred nanograms of the purified 
single stranded DNA was annealed to 3 picomoles of a 
phosphorylated 25mer mutagenic oligonucleotide, 

5 1 -gtttcagcggCgCCagaatagaaag-3 ' , ^SGQ \fr JOO'-IHT) 
where upper case indicates the changes) . Following 
filling in with T4 DNA polymerase and ligation with T4 
DNA ligase, the reaction sample was used to transfect 
competent XLl-Blue (TM) cells which were subsequently 
plated to permit the formation of plaques. 

RF DNA, isolated from phage-inf ected cells which 
had been allowed to propagate in liquid culture for 8 
hours, was denatured, spotted on a Nytran membrane, 
baked and hybridized to the 2 5mer mutagenic 
oligonucleotide which had previously been phosphorylated 
with 32 P-ATP. Clones exhibiting strong hybridization 
signals at 70 °C (6°C less than the theoretical Tm of the 
mutagenic oligonucleotide) were chosen for large scale 
RF preparation. The presence of a unique Nar l site at 
nucleotide 1630 was confirmed by restriction enzyme 
analysis. The resultant RF DNA, M13-MB1/2- delta-Narl 
was cut with BamHI , dephosphorylated with calf 
intestinal phosphatase, and ligated to a 1.3 Kb Bam HI 
fragment, encoding the kanamycin-resistance gene (kan) , 
derived from plasmid pUC4K (Pharmacia, Piscataway, NJ) . 
The ligation sample was used to transfect competent 
XLl-Blue (TM) cells which were subsequently plated onto LB 
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plates containing kanamycin (Km) . RF DNA prepared from 
Km R colonies was prepared and subjected to restriction 
enzyme analysis to confirm the insertion of kan into 
M13-MBl/2-delta-NarI DNA thereby creating the phage MK. 
Phage MK grows as well as wild-type M13, indicating that 
the changes at the cleavage site of gene III protein are 
not detectably deleterious to the phage. 
INSERTION OF SYNTHETIC BPTI GENE 

The construction of the BPTI -III expression vector 
is shown in Figure 6. The synthetic bpti - VIII fusion 
contains a Narl site that comprises the last two codons 
of the BPTI -encoding region. A second Nar l site was 
introduced upstream of the BPTI -encoding region as 
follows. RF DNA of phage M13-MB2 6 was cut with AccIII 
and ligated to the dsDNA adaptor: 

5 ' -TATTCTGGCGCCCGT -3' CSC<k \b AJft 
3 ' - ATAAGACCGCGGGCAGGCC - 5 1 ^gc\ (D> K)0 . )M^) 
| Narl | 1 AccIII — 

The ligation sample was subsequently restricted with 
Narl and a 18 0 bp DNA fragment encoding BPTI was 
isolated by agarose gel electrophoresis. RF DNA of 
phage MK was digested with Nar l , dephosphorylated with 
calf intestinal phosphatase and ligated to the 180 bp 
fragment. Ligation samples were used to transfect 
competent XLl-Blue (TM) cells which were plated to enable 
the formation of plaques. DNA, isolated from phage 
derived from plaques, was denatured, applied to a Nytran 
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membrane, baked and hybridized to a P-phosphorylated 
double stranded DNA probe corresponding to the BPTI 
gene. Large scale RF preparations were made for clones 
exhibiting a strong hybridization signal. Restriction 
enzyme digestion analysis confirmed the insertion of a 
single copy of the synthetic BPTI gene into gene III of 
MK to generate phage MK-BPTI . Subsequent DNA sequencing 
confirmed that the sequence of the bpti-III fusion gene 
is correct and that the correct reading frame is 
maintained (Table 111) . Table 116 shows the entire 
coding region, the translation into protein sequence, 
and the functional parts of the polypeptide chain. 
EXPRESSION OF THE BPTI-III FUSION GENE IN VITRO 

MK-BPTI RF DNA was added to a coupled prokaryotic 
transcription- translation extract (Amersham) . Newly 
synthesized radiolabelled proteins were produced and 
subsequently separated by electrophoresis on a 15% SDS- 
polyacrylamide gel subjected to f luorography . The MK- 
BPTI DNA directs the synthesis of an unprocessed gene 
III fusion protein which is 7 Kd larger than the gene 
III product encoded by MK. This is consistent with the 
insertion of 5 8 amino acids of BPTI into the gene III 
protein. Immunoprecipitation of radiolabelled proteins 
generated by the cell -free prokaryotic extract was 
conducted. Neither rabbit anti (M13-gene-VIII-protein) 
IgG nor normal rabbit IgG were able to immunoprecipitate 
the gene III protein encoded by either MK or MK-BPTI. 
However, rabbit anti -BPTI IgG is able to 
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immunoprecipitate the gene III protein encoded by MK- 
BPTI but not by MK. This confirms that the increase in 
size of the III protein encoded by MK-BPTI is 
attributable to the insertion of the BPTI protein. 
WESTERN ANALYSIS 

Phage were recovered from bacterial cultures by PEG 
precipitation. To remove residual bacterial cells, 
recovered phage were resuspended in a high salt buffer 
and subjected to centrif ugat ion, in accord with the 
instructions for the MUTA-GENE <R) M13 in vitro 
Mutagenesis Kit (Catalogue Number 170-3571, Bio-Rad, 
Richmond, CA) . Aliquots of phage (containing up to 40 
Aig of protein) were subjected to electrophoresis on a 
12.5% SDS-urea-polyacrylamide gel and proteins were 
transferred to a sheet of Immobilon by electro- transfer . 
Western blots were developed using rabbit ant i -BPTI 
serum, which had previously been incubated with an E . 
coli extract, followed by goat ant-rabbit antibody 
conjugated to alkaline phosphatase. An immunoreact ive 
protein of 67 Kd is detected in preparations of the MK- 
BPTI but not the MK phage. The size of the 
immunoreact ive protein is consistent with the predicted 
size of a processed BPTI -III fusion protein (6.4 Kd plus 
60 Kd) . These data indicate that BPTI -specif ic epitopes 
are presented on the surface of the MK-BPTI phage but 
not the MK phage . 
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NEUTRALIZATION OF PHAGE TITER WITH AGAROSE- IMMOBILIZED 
ANHYDRO - TRYP SIN 

Anhydro- trypsin is a derivative of trypsin in which 
the active site serine has been converted to 
dehydroalanine . Anhydro- trypsin retains the specific 
binding of trypsin but not the protease activity. 
Unlike polyclonalantibodies , anhydro- trypsin is not 
expected to bind unfolded BPTI or incomplete fragments. 

Phage MK-BPTI and MK were diluted to a 
concentration 1.4-10 12 particles per ml. in TBS buffer 

(PARM88) containing 1.0 mg/ml BSA. Thirty microliters 
of diluted phage were added to 2 , 5, or 10 microliters 
of a 50% slurry of agarose -immobilized anhydro- trypsin 

(Pierce Chemical Co., Rockford, IL) in TBS /BSA buffer. 
Following incubation at 25 °C, aliquot s were removed, 
diluted in ice cold LB broth and titered for plaque- 
forming units on a "lawn of XLl-Blue (TM) cells. Table 114 
illustrates that incubation of the MK-BPTI phage with 
immobilized anhydro- trypsin results in a very 
significant loss in titer over a four hour period while 
no such effect is observed with the MK (control) phage. 
The reduction in phage titer is also proportional to the 
amount of immobilized anhydro- trypsin added to the MK- 
BPTI phage. Incubation with five microliters of a 50% 
slurry of agarose -immobilized streptavidin (Sigma, St. 
Louis, MO) in TBS/BSA buffer does not reduce the titer 
of either the MK-BPTI or MK phage. These data are 
consistent with the presentation of a correctly- folded, 
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functional BPTI protein on the surface of the MK-BPTI 
phage but not on the MK phage. Unfolded or incomplete 
BPTI domains are not expected to bind anhydro- trypsin . 
Furthermore, unfolded BPTI domains are expected to be 
non- specifically sticky . 

NEUTRALIZATION OF PHAGE TITER WITH ANTI-BPTI ANTIBODY 

MK-BPTI and MK phage were diluted to a 
concentration of 4-10 8 plaque -forming units per ml in LB 
broth . Fifteen microliters of diluted phage were added 
to an equivalent volume of either rabbit anti-BPTI serum 
or normal rabbit serum (both diluted 10 fold in LB 
broth) . Following incubation at 3 7 °C, aliquot s were 
removed, diluted by 10 4 in ice-cold LB broth and titered 
for plaque -forming units on a lawn of XLl-Blue (TM> cells. 
Incubation of the MK-BPTI phage with anti-BPTI serum 
results in a steady loss in titer over a two hour period 
while no such effect is observed with the MK phage. As 
expected, normal rabbit serum does not reduce the titer 
of either the MK-BPTI or the MK phage. Prior incubation 
of the anti-BPTI serum with authentic BPTI protein but 
not with an equivalent amount of coli protein, blocks 
the ability of the serum to reduce the titer of the MK- 
BPTI phage. This data is consistent with the 
presentation of BPTI -specif ic epitopes on the surface of 
the MK-BPTI phage but not the MK phage. More 
specifically, the data indicates that these BPTI 
epitopes are associated with the gene III protein and 
that association of this fusion protein with an anti- 
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BPTI antibody blocks its ability to mediate the 
infection of bacterial cells. 

NEUTRALIZATION OF PHAGE TITER WITH TRYPSIN 

MK-BPTI and MK phage were diluted to a 
concentration of 4 -10 s plaque -forming units per ml in LB 
broth. Diluted phage were added to an equivalent volume 
of trypsin diluted to various concentrations in LB 
broth. Following incubation at 37 °C, aliquot s were 
removed, diluted by 10 4 in ice cold LB broth and titered 
for plaque- forming units on a lawn of XLl-Blue (TM ) cells. 
Incubation of the MK-BPTI phage with 0.15 Mg of trypsin 
results in a 70% loss in titer after a two hour period 
while only a 15% loss in titer is observed for the MK 
phage. A reduction in the amount of trypsin added to 
phage results in a reduction in the loss of titer. 
However, at all trypsin concentrations investigated , 
the MK-BPTI phage are more sensitive to incubation with 
trypsin than the MK phage. An interpretation of this 
data is that association of the BPTI-III fusion protein 
displayed on the surface of the MK-BPTI phage with 
trypsin blocks its ability to mediate the infection of 
bacterial cells. 

The reduction in titer of phage MK by trypsin is an 
example of a phenomenon that is likely to be general: 
proteases, if present in sufficient quantity, will 
degrade proteins on the phage and reduce infect ivity. 
The present application lists several means that can be 
used to overcome this problem. 
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AFFINITY SELECTION SYSTEM 

Affinity Selection with Immobilized Anhydro- Trypsin 

MK-BPTI and MK phage were diluted to a 
concentration of 1.4 -10 12 particles per ml in TBS buffer 

(PARM88) containing 1.0 mg/ml BSA. We added 4.0-10 10 
phage to 5 microliters of a 50% slurry of either 
agarose -immobilized anhydro- trypsin beads (Pierce 
Chemical Co.) or agarose-immobilized streptavidin beads 

(Sigma) in TBS/BSA. Following a 3 hour incubation at 
room temperature, the beads were pelleted by 
centrif ugation for 3 0 seconds at 500 0 rpm in a microfuge 
and the supernatant fraction was collected. The beads 
were washed 5 times with TBS/Tween buffer (PARM88) and 
after each wash the beads were pelleted by 
centrif ugation and the supernatant was removed. 
Finally, beads were resuspended in elution buffer (0.1 N 
HC1 containing 1.0 mg/ml BSA adjusted to pH 2.2 with 
glycine) and following a 5 minute incubation at room 
temperature, the beads were pelleted by centrif ugation . 
The supernatant was removed and neutralized by the 
addition of 1.0 M Tris-HCl buffer, pH 8.0. 

Aliquots of phage samples were applied to a Nytran 
membrane using a Schleicher and Schuell (Keene, NH) 
filtration minifold and phage DNA was immobilized onto 
the Nytran by baking at 80°C for 2 hours. The baked 
filter was incubated at 42 °C for 1 hour in pre-wash 
solution (MANI82) and pre-hybridization solution 

(5Prime-3Prime, West Chester, PA). The 1.0 Kb Narl 
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select for phage displaying a specific folded protein 
(in this case, BPTI) . Unfolded or incomplete BPTI 
domains are not expected to bind anhydro- trypsin . 
Affinity Selection with Anti-BPTI antibodies 

MK-BPTI and MK phage were diluted to a 
concentration of 1-10 10 particles per ml in Tris buffered 
saline solution (PARM88) containing 1.0 mg/ml BSA. 
Two-10 8 phage were added to 2.5 /ig of either biotinylated 
rabbit ant i -BPTI IgG in TBS/BSA or biotinylated rabbit 
ant i -mouse antibody IgG (Sigma) in TBS/BSA, and 
incubated overnight at 4°C. A 50% slurry of 

streptavidin- agarose (Sigma) , washed three times with 
TBS buffer prior to incubation with 30 mg/ml BSA in TBS 
buffer for 60 minutes at room temperature, was washed 
three times with TBS/Tween buffer (PARM88) and 
resuspended to a final concentration of 50% in this 
buffer. Samples containing phage and biotinylated IgG 
were diluted with TBS/Tween prior to the addition of 
streptavidin-agarose in TBS/Tween buffer. Following a 
60 minute incubation at room temperature, streptavidin- 
agarose beads were pelleted by centrif ugation for 30 
seconds and the supernatant fraction was collected. The 
beads were washed 5 times with TBS/Tween buffer and 
after each wash, the beads were pelleted by 
centrif ugation and the supernatant was removed. 
Finally, the streptavidin-agarose beads were resuspended 
in elution buffer (0.1 N HCl containing 1.0 mg/ml BSA 
adjusted to pH 2.2 with glycine), incubated 5 minute at 
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to elution buffer releases bound phage only in the case 
of MK-BPTI phage which have previously been incubated 
with biotinylated rabbit anti-BPTI IgG. This data 
indicates that the affinity selection system described 
above can be utilized to select for phage displaying a 
specific antigen (in this case BPTI) . We estimate an 
enrichment factor of at least 40 fold based on the 
calculation 

Percent MK-BPTI phage recovered 

Enrichment Factor = 

Percent MK phage recovered 

EXAMPLE III 

CHARACTERIZATION AND FRACTIONATION OF CLONALLY PURE 
POPULATIONS OF PHAGE, EACH DISPLAYING A SINGLE CHIMERIC 
APROTININ HOMOLOGUE/M13 GENE III PROTEIN: 

This Example demonstrates that chimeric phage 
proteins displaying a target -binding domain can be 
eluted from immobilized target by decreasing pH, and 
the pH at which the protein is eluted is dependent on 
the binding affinity of the domain for the target. 
Standard Procedures : 

Unless otherwise noted, all manipulations were 
carried out at room temperature. Unless otherwise 
noted, all cells are XLl-Blue (TM) (Stratagene, La Jolla, 
CA) . 



294 



1) Demonstration of the Binding of BPTI-III MK Phage to 
Active Trypsin Beads 

Previous experiments designed to verify that BPTI 
displayed by fusion phage is functional relied on the 
use of immobilized anhydro- trypsin, a catalyt ically 
inactive form of trypsin. Although anhydro- trypsin is 
essentially identical to trypsin structurally (HUBE75, 
YOK077) and in binding properties (VINC74 , AKOH72) , we 
demonstrated that BPTI-III fusion phage also bind 
immobilized active trypsin. Demonstration of the 

binding of fusion phage to immobilized active protease 
and subsequent recovery of infectious phage facilitates 
subsequent experiments where the preparation of inactive 
forms of serine proteases by protein modification is 
laborious or not feasible. 

Fifty /xl of BPTI-III MK phage (identified as MK- 
BPTI is Example II) (3.7-10 11 pfu/ml) in either 50 mM 
Tris, pH 7.5, 150 mM NaCl, 1.0 mg/ml BSA (TBS/BSA) 
buffer or 50 mM sodium citrate, pH 6.5, 150 mM NaCl, 1.0 
mg/ml BSA (CBS/BSA) buffer were added to 10 /xl of a 25% 
slurry of immobilized trypsin (Pierce Chemical Co. , 
Rockford, IL) also in TBS/BSA or CBS/BSA. As a control, 
50 Ml MK phage (9.3 -10 12 pfu/ml) were added to 10 /xl of a 
25% slurry of immobilized trypsin in either TBS/BSA or 
CBS/BSA buffer. The infectivity of BPTI-III MK phage is 
25-fold lower than that of MK phage; thus the conditions 
chosen above ensure that an approximately equivalent 
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number of phage particles are added to the trypsin 
beads . After 3 hours of mixing on a Labquake shaker 
(Labindustries Inc., Berkeley, CA) 0.5 ml of either 
TBS/BSA or CBS/BSA was added where appropriate to the 
samples. Beads were washed for 5 min and recovered by 
centrif ugation for 30 sec. The supernatant was removed 
and 0.5 ml of TBS/0.1% Tween-2 0 was added. The beads 
were mixed for 5 minutes on the shaker and recovered by 
centrif ugation as above. The supernatant was removed 
and the beads were washed an additional five times with 
TBS/0.1% Tween-20 as described above. Finally, the 
beads were resuspended in 0.5 ml of elution buffer (0.1 
M HC1 containing 1.0 mg/ml BSA adjusted to pH 2.2 with 
glycine) , mixed for 5 minutes and recovered by 
centrif ugation . The supernatant fraction was removed 
and neutralized by the addition of 130 /il of 1 M Tris, 
pH 8.0. Aliquots of the neutralized elution sample were 
diluted in LB broth and titered for plaque -forming units 
on a lawn of cells. 

Table 2 01 illustrates that a significant percentage 
of the input BPTI-III MK phage bound to immobilized 
trypsin and was recovered by washing with elution 
buffer. The amount of fusion phage which bound to the 
beads was greater in TBS buffer (pH 7.5) than in CBS 
buffer (pH 6.5). This is consistent with the 

observation that the affinity of BPTI for trypsin is 
greater at pH 7.5 than at pH 6.5 (VINC72, VINC74). A 
much lower percentage of the MK control phage (which do 
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not display BPTI) bound to immobilized trypsin and this 
binding was independent of the pH conditions. At pH 
6.5, 1675 times more of the BPTI -III MK phage than of 
the MK phage bound to trypsin beads while at pH 7.5, a 
2 103 -fold difference was observed. Hence fusion phage 
displaying BPTI adhere not only to anhydro- trypsin beads 
but also to active trypsin beads and can be recovered as 
infectious phage. These data, in conjunction with 
earlier findings, strongly suggest that BPTI displayed 
on the surface of fusion phage is appropriately folded 
and functional. 

2) Generation of PI Mutants of BPTI 

To demonstrate the specificity of interaction of 
BPTI -III fusion phage with immobilized serine proteases, 
single amino acid substitutions were introduced at the 
PI position (residue 15 of mature BPTI) of the BPTI -III 
fusion protein by site-directed mutagenesis. A 25mer 
mutagenic oligonucleotide (PI) was designed to 
substitute a LEU codon for the LYSi 5 codon. This 
alteration is desired because BPTI (K15L) is a moderately 
good inhibitor of human neutrophil elastase (HNE) (K<a = 
2.9-10" 9 M) (BECK88b) and a poor inhibitor of trypsin. A 
fusion phage displaying BPTI (K15L) should bind to 
immobilized HNE but not to immobilized trypsin. BPTI- 
III MK fusion phage would be expected to display the 
opposite phenotype (bind to trypsin, fail to bind to 
HNE) . These observations would illustrate the binding 
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specificity of BPTI-III fusion phage for immobilized 
serine proteases . 

Mutagenesis of the PI region of the BPTI-VIII gene 
contained within the intergenic region of recombinant 
phage MB4 6 was carried out using the Muta-Gene M13 In 
Vitro Mutagenesis Kit (Bio-Rad, Richmond, CA) . MB46 
phage (7.5-10 6 pfu) were used to infect a 50 ml culture 
of CJ236 cells (O.D.600 = 0.5). Following overnight 
incubation at 37°C, phage were recovered and uracil - 
containing single -stranded DNA was extracted from the 
phage. The single-stranded DNA was further purified by 
NACS chromatography as recommended by the manufacturer 
(B.R.L., Gaithersburg, MD) . 

Two hundred nanograms of the purified single- 
stranded DNA were annealed to 3 picomoles of the 
phosphorylated 25mer mutagenic oligonucleotide (PI) . 
Following filling in with T4 DNA polymerase and ligation 
with T4 DNA ligase, the sample was used to transfect 
competent cells which were subsequently plated on LB 
plates to permit the formation of plaques. Phage 
derived from picked plaques were applied to a Nytran 
membrane using a Schleicher and Schuell (Keene, NH) 
minifold I apparatus (Dot Blot Procedure) . Phage DNA 
was immobilized onto the filter by baking at 80 °C for 2 
hours. The filter was bathed in 1 X Southern pre- 
hybridization buffer (5Prime-3Prime, West Chester, PA) 
for 2 hours. Subsequently, the filter was incubated in 
1 X Southern hybridization solution (5Prime-3Prime) 
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containing a 21mer probing oligonucleotide (LEU1) which 
had been radioactively labelled with gamma- 32 P-ATP 

(N.E.N. /DuPont, Boston, MA) by T4 polynucleotide kinase 

(New England BioLabs (NEB) , Beverly, MA) . Following 
overnight hybridization, the filter was washed 3 times 
with 6 X SSC at room temperature and once at 6 0 °C in 6 X 
SSC prior to autoradiography. Clones exhibiting strong 
hybridization signals were chosen for large scale Rf 
preparation using the PZ523 spin column protocol 

(5Prime-3Prime) . Restriction enzyme analysis confirmed 
that the structure of the Rf was correct and DNA 
sequencing confirmed the substitution of a LEU codon 

(TTG) for the LYSi 5 codon (AAA) . This Rf DNA was 
designated MB46 (K15L) . 

3) Generation of the BPTI-III MA Vector 

The original gene III fusion phage MK can be 
detected on the basis of its ability to transduce cells 
to kanamycin resistance (Km R ) . It was deemed 

advantageous to generate a second gene III fusion vector 
which can confer resistance to a different antibiotic, 
namely ampicillin (Ap) . One could then mix a fusion 
phage conferring Ap R while displaying engineered protease 
inhibitor A (EPI-A) with a second fusion phage 
conferring Km R while displaying EPI-B. The mixture could 
be added to an immobilized serine protease and, 
following elution of bound fusion phage, one could 
evaluate the relative affinity of the two EPIs for the 
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immobilized protease from the relative abundance of 
phage that transduce cells to Km R or Ap R . 

The ap R gene is contained in the vector pGem3Zf 
(Promega Corp., Madison, WI) which can be packaged as 
single stranded DNA contained in bacteriophage when 
helper phage are added to bacteria containing this 
vector. The recognition sites for restriction enzymes 
Smal and SnaBI were engineered into the 3 1 non- coding 
region of the Ap R (IS- lactamase) gene using the technique 
of synthetic oligonucleotide directed site specific 
mutagenesis. The single stranded DNA was used as the 
template for in vitro mutagenesis leading to the 
following DNA sequence alterations (numbering as 
supplied by Promega) : a) to create a Sma l (or Xma l) 
site, bases Tm 5 -->C and Ain 6 -->C, and b) to create a 
Sna BI site, Gn 2 5-->T, Ci 12 9-->T, and T 113 o-->A. The 
alterations were confirmed by radiolabelled probe 
analysis with the mutating oligonucleotide and 
restriction enzyme analysis; this plasmid is named 
pSGK3 . 

Plasmid SGK3 was cut with Aatll and Sma l and 
treated with T4 DNA polymerase (NEB) to remove 
overhanging 3 f ends (MANI82, SAMB8 9) . Phosphorylated 
Hin dlll linkers (NEB) were ligated to the blunt ends of 
the DNA and following Hin di 1 1 digestion, the 1.1 kb 
fragment was isolated by agarose gel electrophoresis 
followed by purification on an Ultraf ree-MC filter unit 
as recommended by the manufacturer (Millipore , Bedford, 
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MA) . M13-MB1/2 -delta Rf DNA was cut with Hin di 1 1 and 
the linearized Rf was purified and ligated to the 1.1 kb 
fragment derived from pSGK3 . Ligation samples were used 
to transfect competent cells which were plated on LB 
plates containing Ap. Colonies were picked and grown in 
LB broth containing Ap overnight at 3 7 °C. Aliquot s of 
the culture supernatants were assayed for the presence 
of infectious phage. Rf DNA was prepared from cultures 
which were both Ap R and contained infectious phage. 
Restriction enzyme analysis confirmed that the Rf 
contained a single copy of the Ap R gene inserted into the 
intergenic region of the M13 genome in the same 
transcriptional orientation as the phage genes. This Rf 
DNA was designated MA. 

The 5.9 kb Bgl ll/ Bsm I fragment from MA Rf DNA and 
the 2.2 kb Bglll/BsmI fragment from BPTI-III MK Rf DNA 
were ligated together and a portion of the ligation 
mixture was used to transfect competent cells which were 
subsequently plated to permit plaque formation on a lawn 
of cells. Large and small size plaques were observed on 
the plates. Small size plaques were picked for further 
analysis since BPTI-III fusion phage give rise to small 
plaques due to impairment of gene III protein function. 
Small plaques were added to LB broth containing Ap and 
cultures were incubated overnight at 37°C. An Ap R 
culture which contained phage which gave rise to small 
plaques when plated on a lawn of cells was used as a 
source of Rf DNA. Restriction enzyme analysis confirmed 
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that the BPTI-III fusion gene had been inserted into the 
MA vector. This Rf was designated BPTI-III MA. 
4) Construction of BPTI (K15L) -III MA 

MB46(K15L) Rf DNA was digested with Xhol and EagI 
and the 12 5 bp DNA fragment was isolated by 
electrophoresis on a 2% agarose gel followed by- 
extraction from an agarose slice by centrif ugation 
through an Ultrafree-MC filter unit. The 8.0 kb 

Xho l / Eag I fragment derived from BPTI-III MA Rf was also 
prepared. The above two fragments were ligated and the 
ligation sample was used to transfect competent cells 
which were plated on LB plates containing Ap . Colonies 
were picked and used to inoculate LB broth containing 
Ap. Cultures were incubated overnight at 3 7 °C and phage 
within the culture supernatants was probed using the Dot 
Blot Procedure. Filters were hybridized to a 

radioactively labelled oligonucleotide (LEU1) . Positive 
clones were identified by autoradiography after washing 
filters under high stringency conditions. Rf DNA was 
prepared from Ap R cultures which contained phage carrying 
the K15L mutation. Restriction enzyme analysis and DNA 
sequencing confirmed that the K15L mutation had been 
introduced into the BPTI-III MA Rf . This Rf was 
designated BPTI (K15L) -III MA. Interestingly, 
BPTI (K15L) -III MA phage gave rise to extremely small 
plaques on a lawn of cells and the infectivity of the 
phage is 4 to 5 fold less than that of BPTI-III MK 
phage. This suggests that the substitution of LEU for 
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LYSis impairs the ability of the BPTI:gene III fusion 
protein to mediate phage infection of bacterial cells. 

5) Preparation of Immobilized Human Neutrophil 
Elastase 

One ml of React i -Gel 6 x CDI activated agarose 
(Pierce Chemical Co.) in acetone (200 /xl packed beads) 
was introduced into an empty Select -D spin column 
(5Prime-3Prime) . The acetone was drained out and the 
beads were washed twice rapidly with 1.0 ml of ice cold 
water and 1.0 ml of ice cold 100 mM boric acid, pH 8.5, 
0.9% NaCl . Two hundred jul of 2.0 mg/ml human neutrophil 
elastase (HNE) (CalBiochem, San Diego, CA) in borate 
buffer were added to the beads. The column was sealed 
and mixed end over end on a Labquake Shaker at 4°C for 
36 hours. The HNE solution was drained off and the 
beads were washed with ice cold 2.0 M Tris, pH 8.0 over 
a 2 hour period at 4°C to block remaining reactive 
groups. A 50% slurry of the beads in TBS/BSA was 
prepared. To this was added an equal volume of sterile 
100% glycerol and the beads were stored as a 25% slurry 
at -20 °C. Prior to use, the beads were washed 3 times 
with TBS/BSA and a 50% slurry in TBS/BSA was prepared. 

6) Characterization of the Affinity of BPTI-III MK and 
BPTI (K15L) -III MA Phage for Immobilized Trypsin and 
Human Neutrophil Elastase 

Thirty [il of BPTI-III MK phage in TBS/BSA (1.7-10 11 
pfu/ml) was added to 5 - Ml of a 50% slurry of either 
immobilized human neutrophil elastase or immobilized 
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trypsin (Pierce Chemical Co.) also in TBS/BSA. 
Similarly 30 /il of BPTI (K15L) -III MA phage in TBS/BSA 
(3.2-10 10 pfu/ml) was added to either immobilized HNE or 
trypsin. Samples were mixed on a Labquake shaker for 3 
hours. The beads were washed with 0.5 ml of TBS/BSA for 
5 minutes and recovered by centrif ugat ion . The 
supernatant was removed and the beads were washed 5 
times with 0.5 ml of TBS/0.1% Tween-20. Finally, the 
beads were resuspended in 0 . 5 ml of elution buffer (0.1 
M HC1 containing 1.0 mg/ml BSA adjusted to pH 2.2 with 
glycine) , mixed for 5 minutes and recovered by 
centrif ugation. The supernatant fraction was removed, 
neutralized with 130 ix\ of 1 M Tris, pH 8.0, diluted in 
LB broth, and titered for plaque -forming units on a lawn 
of cells. 

Table 202 illustrates that 82 times more of the 
BPTI -III MK input phage bound to the trypsin beads than 
to the HNE beads. By contrast, the BPTI (K15L) -III MA 
phage bound preferentially to HNE beads by a factor of 
36. These results are consistent with the known 

affinities of wild type and the K15L variant of BPTI for 
trypsin and HNE. Hence BPTI -III fusion phage bind 
selectively to immobilized proteases and the nature of 
the BPTI variant displayed on the surface of the fusion 
phage dictates which particular protease is the optimum 
receptor for the fusion phage. 

7) Effect of pH on the Dissociation of Bound BPTI -III 
MK and BPTI (K15L) - I I I MA Phage from Immobilized 
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the shaker, recovered by centrif ugation and the 
supernatant was removed. The beads were washed with 0.5 
ml of TBS/0.1% Tween-20 for 5 minutes and recovered by 
centrif ugation . Four additional washes with TBS/0.1% 
Tween-2 0 were performed as described above. The beads 
were washed as above with 0.5 ml of 10 0 mM sodium 
citrate, pH 7.0 containing 1.0 mg/ml BSA. The beads 
were recovered by centrif ugation and the supernatant was 
removed. Subsequently, the HNE beads were washed 

sequentially with a series of 100 mM sodium citrate, 1.0 
mg/ml BSA buffers of pH 6.0, 5.0, 4.0 and 3.0 and 
finally with the 2.2 elution buffer described above. 
The pH washes were neutralized by the addition of 1 M 
Tris, pH 8.0, diluted in LB broth and titered for 
plaque- forming units on a lawn of cells. 

Table 2 03 illustrates that a low percentage of the 
input BPTI-III MK fusion phage adhered to the HNE beads 
and was recovered in the pH 7.0 and 6 . 0 washes 
predominantly. By contrast, a significantly higher 
percentage of the BPTI (K15L) -III MA phage bound to the 
HNE beads and was recovered predominantly in the pH 5.0 
and 4.0 washes. Hence lower pH conditions ( i.e. more 
stringent) are required to dissociate BPTI (K15L) -III MA 
than BPTI -MK phage from immobilized HNE. The affinity 
of BPTI (K15L) is over 1000 times greater than that of 
BPTI for HNE (based on reported values (BECK8 8b) ) . 

Hence this suggests that lower pH conditions are indeed 
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required to dissociate fusion phage displaying a BPTI 

variant with a higher affinity for HNE. 

8) Construction of BPTI (MGNG) -III MA Phage 

The light chain of bovine inter-ce- trypsin inhibitor 
contains 2 domains highly homologous to BPTI . The amino 
terminal proximal domain (called BI-8e) has been 
generated by proteolysis and shown to be a potent 
inhibitor of HNE (Ka = 4.4-10" 11 M) (ALBR83) . By contrast 
a BPTI variant with the single substitution of LEU for 
LYS15 exhibits a moderate affinity for HNE (Ka = 2.9-10* 9 
M) (BECK88b) . It has been proposed that the PI residue 
is the primary determinant of the specificity and 
potency of BPTI -like molecules (BECK88b, LASK80 and 
works cited therein) . Although both BI-8e and 

BPTI (K15L) feature LEU at their respective PI positions, 
there is a 66 fold difference in the affinities of these 
molecules for HNE. Structural features, other than the 
PI residue, must contribute to the affinity of BPTI-like 
molecules for HNE. 

A comparison of the structures of BI-8e and 
BPTI (K15L) reveals the presence of three positively 
charged residues at positions 39, 41, and 42 of BPTI 
which are absent in BI-8e. These hydrophilic and highly 
charged residues of BPTI are displayed on a loop which 
underlies the loop containing the PI residue and is 
connected to it via a disulfide bridge. Residues within 
the underlying loop (in particular residue 39) 
participate in the interaction of BPTI with the surface 
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of trypsin near the catalytic pocket (BLOW72) and may 
contribute significantly to the tenacious binding of 
BPTI to trypsin. However, these hydrophilic residues 
might hamper the docking of BPTI variants with HNE . In 
support of this hypothesis, BI-8e displays a high 
affinity for HNE and contains no charged residues in the 
region spanning residues 39-42. Hence residues 39 
through 42 of wild type BPTI were replaced with the 
corresponding residues of the human homologue of BI-8e. 
We anticipated that a BPTI derivative containing the 
MET - GL Y - ASN - GL Y (MGNG) sequence (SEQ ID NO: 12) would 
exhibit a higher affinity for HNE than corresponding 
derivatives which retain the sequence of wild type BPTI 
at residues 39-42. 

A double stranded oligonucleotide with AccI and 
EagI compatible ends was designed to introduce the 
desired alteration of residues 39 to 42 via cassette 
mutagenesis. Codon 45 was altered to create a new XmnI 
site, unique in the structure of the BPTI gene, which 
could be used to screen for mutants. This alteration at 
codon 45 does not alter the encoded amino-acid sequence. 
BPTI -I II MA Rf DNA was digested with AccI. Two 
oligonucleotides (CYSB and CYST) corresponding to the 
bottom and top strands of the mutagenic DNA were 
annealed and ligated to the Acc I digested BPTI-III MA Rf 
DNA. The sample was digested with Bglll and the 2.1 kb 
Bgl ll/ Eag I fragment was purified. BPTI-III MA Rf was 
also digested with Bgl ll and Eag I and the 6.0 kb 
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fragment was isolated and ligated to the 2.1 kb 
Bglll/EagI fragment described above. Ligation samples 
were used to transfect competent cells which were plated 
to permit the formation of plaques on a lawn of cells. 
Phage derived from plaques were probed with a 
radioactively labelled oligonucleotide (CYSB) using the 
Dot Blot Procedure. Positive clones were identified by 
autoradiography of the Nytran membrane after washing at 
high stringency conditions . Rf DNA was prepared from Ap R 
cultures containing fusion phage which hybridized to the 
CYSB probe. Restriction enzyme analysis and DNA 

sequencing confirmed that codons 3 9-42 of BPTI had been 
altered. The Rf DNA was designated BPTI (MGNG) -III MA 
(The amino acid sequence MGNG has SEQ ID NO: 12; BPTI 

( , MGNG ) - III MA denotes a strain of M13 that 

displays BPTI ( , MGNG ) fused to the gill protein 

and that -carries the bla gene that confers AP r ') . 
9) Construction of BPTI (K15L, MGNG) -III MA 

BPTI (MGNG) -III MA Rf DNA was digested with AccI and 
the 5.6 kb fragment was purified. BPTI (K15L) -III MA was 
digested with Acc I and the 2 . 5 kb DNA fragment was 
purified. The two fragments above were ligated together 
and ligation samples were used to transfect competent 
cells which were plated for plaque production. Large 
and small plaques were observed on the plate. 
Representative plaques of each type were picked and 
phage were probed with the LEU1 oligonucleotide via the 
Dot Blot Procedure. After the Nytran filter had been 
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washed under high stringency conditions, positive clones 
were identified by autoradiography. Only the phage 
which hybridized to the LEU1 oligonucleotide gave rise 
to the small plaques confirming an earlier observation 
that substitution of LEU for LYS 15 substantially reduces 
phage infectivity. Appropriate cultures containing 

phage which hybridized to the LEU1 oligonucleotide were 
used to prepare Rf DNA. Restriction enzyme analysis and 
DNA sequencing confirmed that the K15L mutation had been 
introduced into BPTI (MGNG) -III MA. This Rf DNA was 
designated BPTI (K15L, MGNG) -III MA. 

10) Effect of Mutation of Residues 39-42 of BPTI (K15L) 
on its Affinity for Immobilized HNE 

Thirty fil of BPTI (K15L, MGNG) -III MA phage (9.2 -10 9 
pfu/ml in TBS/BSA) were added to 5 fxl of a 50% slurry of 
immobilized HNE also in TBS/BSA. Similarly 30 /il of 
BPTI (K15L) -III MA phage (1.2 -10 10 pfu/ml in TBS/BSA) were 
added to immobilized HNE. The samples were incubated 
for 3 hours on a Labquake shaker. The beads were washed 
for 5 min with 0.5 ml of TBS/BSA and recovered by 
centrif ugation . The beads were washed 5 times with 0.5 
ml of TBS/0.1% Tween-20 as described above. Finally, 
the beads were washed sequentially with a series of 100 
mM sodium citrate buffers of pH 7.0, 6.0, 5.5, 5.0, 
4.75, 4.5, 4.25, 4.0 and 3.5 as described above. pH 
washes were neutralized, diluted in LB broth and titered 
for plaque -forming units on a lawn of cells. 
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Table 204 illustrates that almost twice as much of 
the BPTT (K15L,MGNG) -III MA as BPTI (K15L) -III MA phage 
bound to HNE beads. In both cases the pH 4.75 fraction 
contained the largest proportion of the recovered phage. 
This confirms that replacement of residues 39-42 of wild N 
type BPTI with the corresponding residues of BI-8e 
enhances the binding of the BPTI (K15L) variant to HNE. 
11) Fractionation of a Mixture of BPTI-III MK and 
BPTI (K15L,MGNG) -III MA Fusion Phage 

The observations described above indicate that 
BPTI (K15L,MGNG) -III MA and BPTI-III MK phage exhibit 
different pH elution profiles from immobilized HNE. It 
seemed plausible that this property could be exploited 
to fractionate a mixture of different fusion phage. 

Fifteen ill of BPTI-III MK phage (3.92-10 10 pfu/ml in 
TBS/BSA) , equivalent to 8.91-10 7 Km R transducing units, 
were added to 15 /xl of BPTI (K15L, MGNG) -III MA phage 
(9.85 -10 9 pfu/ml in TBS/BSA), equivalent to 4.44-10 7 Ap R 
transducing units. Five /il of a 50% slurry of 

immobilized HNE in TBS/BSA was added to the phage and 
the sample was incubated for 3 hours on a Labquake 
mixer. The beads were washed for 5 minutes with 0.5 ml 
of TBS/BSA prior to being washed 5 times with 0.5 ml of 
TBS/2.0% Tween-20 as described above. Beads were washed 
for 5 minutes with 0.5 ml of 100 mM sodium citrate, pH 
7.0 containing 1.0 mg/ml BSA. The beads were recovered 
by centrif ugation and the supernatant -was removed. 
Subsequently, the HNE beads were washed sequentially 
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with a series of 100 mM citrate buffers of pH 6.0, 5.0 
and 4.0. The pH washes were neutralized by the addition 
of 130 ixl of 1 M Tris, pH 8.0. 

The relative proportion of BPTI-III MK and 
BPTI (K15L,MGNG) -III MA phage in each pH fraction was 
evaluated by determining the number of phage able to 
transduce cells to Km R as opposed to Ap R . Fusion phage 
diluted in 1 X Minimal A salts were added to 100 til of 
cells (O.D.600 = 0.8 concentrated to 1/20 original 
culture volume) also in Minimal salts in a final volume 
of 200 /il- The sample was incubated for 15 min at 37 °C 
prior to the addition of 200 ^1 of 2 X LB broth. After 
an additional 15 min incubation at 37 °C, duplicate 
aliquots of cells were plated on LB plates containing 
either Ap or Km to permit the formation of colonies. 
Bacterial colonies on each type of plate were counted 
and the data was used to calculate the number of Ap R and 
Km R transducing units in each pH fraction. The number of 
Ap R transducing units is indicative of the amount of 
BPTI (K15L,MGNG) - III MA phage in each pH fraction while 
the total number of Km R transducing units is indicative 
of the amount of BPTI-III MK phage. 

Table 2 05 illustrates that a low percentage of the 
BPTI-III MK input phage (as judged by Km R transducing 
units) adhered to the HNE beads and was recovered 
predominantly in the pH 7.0 fraction. By contrast, a 
significantly higher percentage of the BPTI (K15L, MGNG) - 
III MA phage (as judged by Ap R transducing units) adhered 
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to the HNE beads and was recovered predominantly in the 
p H 4.0 fraction. A comparison of the total number of Ap R 
and Km* transducing units in the pH 4.0 fraction shows 
that a 984-fold enrichment of BPTI (K15L, MGNG) -III MA 
phage over BPTI-IH MK phage was achieved. Hence, the 
above procedure can be utilized to fractionate mixtures 
of fusion phage on the basis of their relative 
affinities for immobilized HNE. 
12) Construction of BPTI (K15V, R17L) -I II MA 

A BPTI variant containing the alterations K15V and 
R17L demonstrates the highest affinity for HNE of any 
BPTI variant described to date (K, = 6-1CT" M) (AUER89) 
As a means of testing the selection system described 
herein, a fusion phage displaying this variant of BPTI 
was generated and used as a "reference" phage to 
characterize the affinity for immobilized HNE of fusion 
phage displaying a BPTI variant with a known affinity 
for free HNE. A 76 bp mutagenic oligonucleotide (VAL1) 
was designed to convert the LYS 15 codon (AAA) to a VAL 
codon (GTT) and the ARG 17 codon (CGA) to a LEU codon 
(CTG) At the same time codons 11, 12 and 13 were 
altered to destroy the Apal site resident in the wild 
type BPTI gene while creating a new RsrII site, which 
could be used to screen for correct clones. 

The single stranded VAL1 oligonucleotide was 
converted to the double stranded form following the 
. procedure described in Current Protocols in Molecular 
Biology (AUSU87) . One ^ of the VAL1 oligonucleotide 
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were added to 40 /xl of electro-competent cells which 
were shocked using a Bio-Rad Gene Pulser device set at 
1.7 kv, 2 5 iiF and 800 Q . One ml of SOC media was 
immediately added to the cells which were allowed to 
recover at 37 °C for one hour. Aliquot s of the 

electroporated cells were plated onto LB plates 
containing Ap to permit the formation of colonies. 

Phage contained within cultures derived from picked 
Ap R colonies were probed with two radiolabelled 
oligonucleotides (PRP1 and ESP1) via the Dot Blot 
Procedure. Rf DNA was prepared from cultures containing 
phage which exhibited a strong hybridization signal with 
the ESP1 oligonucleotide but not with the PRP1 
oligonucleotide. Restriction enzyme analysis verified 
loss of the Apa l site and acquisition of a new Rsr ll 
site diagnostic for the changes in the PI region. 
Fusion phage were also probed with a radiolabelled 
oligonucleotide (VLP1) via the Dot Blot Procedure. 
Autoradiography confirmed that fusion phage which 
previously failed to hybridize to the PRP1 probe, 
hybridized to the VLP1 probe. DNA sequencing confirmed 
that the LYSi 5 and ARG i7 codons had been converted to VAL 
and LEU codons respectively. The Rf DNA was designated 
BPTI (K15V,R17L) -III MA. 

13) Affinity of BPTI (K15V, R17L) -III MA Phage for 
Immobilized HNE 

Forty /il of BPTI (K15 , R17L) -III MA phage (9.8- 10 10 
pfu/ml) in TBS/BSA were added to 10 ill of a 50% slurry 
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required to dissociate, from immobilized HNE , fusion 
phage displaying a BPTI variant with a higher affinity 
for free HNE. 

* * * 

EXAMPLE IV 

CONSTRUCTION OF A VARIEGATED POPULATION OF PHAGE 
DISPLAYING BPTI DERIVATES AND FRACTIONATION FOR MEMBERS 
THAT DISPLAY BINDING DOMAINS HAVING HIGH AFFINITY FOR 
HUMAN NEUTROPHIL ELASTASE : 

We here describe generation of a library of 1000 
different potential engineered protease inhibitiors 
(PEPIs) and the fractionation with immobilized HNE to 
obtain an engineered protease inhibitor (Epi) having 
high affinity for HNE. Successful Epis that bind HNE 
are designated EpiNEs . 

1) Design of a Mutagenic Oligonucleotide to Create a 
Library of Fusion Phage 

A 76 bp variegated oligonucleotide (MYMUT) was 
designed to construct a library of fusion phage 
displaying 1000 different PEPIs derived from BPTI. The 
oligonucleotide contains 1728 different DNA sequences 
but due to the degeneracy of the genetic code, it 
encodes 1000 different protein sequences. The 
oligonucleotide was designed so as to destroy an Apa l 
site (shown in Table 113) encompassing codons 12 and 13. 
Apa l digestion could be used to select against the 
parental Rf DNA used to construct the library. 
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The MYMUT oligonucleotide permits the substitution 
of 5 hydrophobic residues (PHE, LEU, ILE, VAL, and MET 
via a DTS codon (D = approximately equimolar A, T, and 
G; S = approximately equimolar C and G) ) for LYS15 - 
Replacement of LYS15 in BPTI with aliphatic hydrophobic 
residues via semi -synthesis has provided proteins having 
higher affinity for HNE than BPTI (TANK77, JERI74a,b, 
WENZ80, TSCH86, BECK88b) . At position 16, either GLY or 
ALA are permitted (GST codon) . This is in keeping with 
the predominance of these . two residues at the 
corresponding positions in a variety of BPTI homologues 
(CREI87) . The variegation scheme at position 17 is 
identical to that at 15. Limited data is available on 
the relative contribution of this residue to the 
interaction of BPTI homologues with HNE. A variety of 
hydrophobic residues at position 17 was included with 
the anticipation that they would enhance the docking of 
a BPTI variant with HNE. Finally at positions 18 and 
19, 4 (PHE, SER, THR, and ILE via a WYC codon (W = 
approximately equimolar A and T; Y = approximately 
equimolar T and C) ) and 5 (SER, PRO, THR, LYS , GLN, and 
stop v ^ a an HMA codon (H = approximately equimolar A, C, 
and T; M = approximately equimolar A and C) ) different 
amino acids respectively are encoded. These different 
amino acid residues are found in the corresponding 
positions of BPTI homologues that are known to bind to 
HNE (CREI87) . Although the amino acids included in the 
PEPI library were chosen because there was some 
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indication that they might facilitate binding to HNE, it 
was not and is not possible to predict which combination 
of these amino acids will lead to high affinity for HNE. 
The mutagenic oligonucleotide MYMUT was synthesized by 
Genetic Design Inc. (Houston, Texas). 

2) Construction of Library of Fusion Phage Displaying 
Potential Engineered Protease Inhibitors 

The single -stranded mutagenic MYMUT DNA was 
converted to the double stranded form with compatible 
Xhol and StuI ends and dephosphorylated with HK <TM) 
phosphatase as described above for the VAL1 
oligonucleotide. BPTI (MGNG) -III MA Rf DNA was digested 
with Xho l and Stu I for 3 hours at 3 7 °C to ensure 
complete digestion. The 8.0 kb DNA fragment was 

purified by agarose gel electrophoresis and Ultrafree-MC 
unit filtration. One fxl of the dephosphorylated MYMUT 
DNA (5 ng) was ligated to 50 ng of the 8.0 kb fragment 
derived from BPTI (MGNG) -III MA Rf DNA. Under these 
conditions, the 10:1 molar ratio of insert to vector was 
found to be optimal for the generation of transf ormants . 
Ligation samples were extracted with phenol, 
phenol/chloroform/ IAA (25:24:1, v:v:v) and 

chloroform/ IAA (24:1, v:v) and DNA was ethanol 
precipitated prior to electroporation. One /il of the 
recovered ligation DNA was added to 40 /il of electro- 
competent cells. Cells were shocked using a Bio-Rad 
Gene Pulser device as described above. Immediately 
following electroshock, 1.0 ml of SOC media was added to 
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the cells which were allowed to recover at 3 7 °C for 6 0 
minutes with shaking. The electroporated cells were 
plated onto LB plates containing Ap to permit the 
formation of colonies. 

To assess the efficiency of the cassette 
mutagenesis procedure, 39 transf ormants were picked at 
random and phage present in culture supernatants were 
applied to a Nytran membrane and probed using the Dot 
Blot Procedure . Two Nytran membranes were prepared in 
this manner. The first filter was allowed to hybridize 
to the CYSB oligonucleotide which had previously been 
radiolabelled. The second membrane was allowed to 
hybridize to the PRP1 oligonucleotide which had also 
been radiolabelled. Filters were subjected to 

autoradiography following washing under high stringency 
conditions. Of the 3 9 phage samples applied to the 
membrane, all 3 9 hybridized to the CYSB probe. This 
indicated that there was fusion phage in the culture 
supernatants and that at least the DNA encoding residues 
35-47 appeared to be present in the phage genomes. Only 
11 of the 39 samples hybridized to the PRP1 
oligonucleotide indicating that 28% of the transf ormants 
were probably the parental phage BPTI (MGNG) -III MA used 
to generate the library. The remaining 28 clones failed 
to hybridize to the PRP1 probe indicating that 
substantial alterations were introduced into the PI 
region by cassette mutagenesis using the MYMUT 
oligonucleotide. Of these 28 samples, all were found to 
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contain infectious phage indicating that mutagenesis did 
not result in frame shift mutations which would lead to 
the generation of defective gene III products and non- 
infectious phage. (These 2 8 PEPI -displaying phage 
constitute a mini-library, the fractionation of which is 
discussed below.) Hence the overall efficiency of 
mutagenesis was estimated to be 72% in those cases where 
ligation DNA was not subjected to Apa l digestion prior 
to electroporation. 

Bacterial colonies were harvested by overlaying 
chilled LB plates containing Ap with 5 ml of ice cold LB 
broth and scraping off cells using a sterile glass rod. 
A total of 4899 transf ormants were harvested in this 
manner of which 32 99 were obtained by electroporation of 
ligation samples which were not digested with Apa l . 
Hence we estimate that 72% of these transf ormants ( i.e. 
2375) represent mutants of the parental BPTI (MGNG) -III 
MA phage derived by cassette mutagenesis of the PI 
position. An additional 1600 transf ormants were 

obtained by electroporation of ligation samples which 
had been digested with Apal. If we assume that all of 
these clones contain new sequences at the PI position 
then the total number of mutants in the pool of 4899 
transf ormants is estimated to be 2375 + 1600 = 3975. 
The total number of potentially different DNA sequences 
in the MYMUT library is 1728. We calculate that the 
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for plaque- forming units on a lawn of cells. The total 
amount of fusion phage (as judged by pfu) appearing in 
each pH wash fraction was determined. 

Figure 7 illustrates that the largest percentage of 
input phage which bound to the HNE beads was recovered 
in the pH 5.0 fraction. The elution peak exhibits a 
trailing edge on the low pH side suggesting that a small 
proportion of the total bound fusion phage might elute 
from the HNE beads at a pH < 5. BPTI (K15L) -III phage 
display a BPTI variant with a moderate affinity for HNE 
(Kd = 2.9-10" 9 M) (BECK88b) . Since BPTI (K15L) -III phage 
elute from HNE beads as a peak centered on pH 4.75 and 
the highest peak in the first passage of the mini- 
library over HNE beads is centered on pH 5.0, we infer 
that many members of the MYMUT PEPI mini -library display 
PEPIs having moderate to high affinity for HNE. 

To enrich for fusion phage displaying the highest 
affinity for HNE, phage contained in the lowest pH 
fraction (pH 2.0) from the first enrichment cycle were 
amplified and subjected to a second round of 
fractionation. Amplification involved the Transduction 
Procedure described above. Fusion phage (2 000 pfu) were 
incubated with 100 ill of cells for 15 minutes at 37 °C in 
200 /il of 1 X Minimal A salts. Two hundred ill of 2 X LB 
broth was added to the sample and cells were allowed to 
recover for 15 minutes at 3 7 °C with shaking. One 
hundred ill portions of the above sample were plated onto 
LB plates containing Ap . Five such transduction 
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reactions were performed yielding a total of 2 0 plates, 
each containing approximately 350 colonies (7000 
transf ormants in total) . Bacterial cells were harvested 
as described for the preparation of the MYMUT library 
and fusion phage were collected as described for the 
preparation of the mini -library. A total of 200 /xl of 
fusion phage (4.3-10 12 pfu/ml in TBS/BSA) derived from 
the pH 2.0 fraction from the first passage of the mini- 
library was obtained in this manner. 
b) Second Enrichment Cycle 

Forty /il of the above phage stock was added to 10 
Ml of a 50% slurry of HNE beads in TBS/BSA. The sample 
was allowed to mix for 1.5 hours and the HNE beads were 
washed with TBS/BSA, TBS/0.5% Tween and sodium citrate 
buffers as described above. Aliqouts of neutralized pH 
fractions were diluted and titered as described above. 

The elution profile for the second passage of the 
mini-library over HNE beads is shown in Figure 7. The 
largest percentage of the input phage which bound to the 
HNE beads was recovered in the pH 3.5 wash. A smaller 
peak centered on pH 4.5 may represent residual fusion 
phage from the first passage of the mini -library which 
eluted at pH 5.0. The percentage of total input phage 
which eluted at pH 3.5 in the second cycle exceeds the 
percentage of input phage which eluted at pH 5 . 0 in the 
first cycle. This is indicative of more avid binding of 
fusion phage to the HNE matrix. Taken together, the 
significant shift in the pH elution profile suggests 
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that selection for fusion phage displaying BPTI variants 
with higher affinity for HNE occurred. 
c) Third Cycle 

Phage obtained in the pH 2.0 fraction from the 
second passage of the mini -library were amplified as 
above and subjected to a third round of fractionation. 
The pH elution profile is shown in Figure 7. The 
largest percentage of input phage was recovered in the 
pH 3.5 wash as is the case with the second passage of 
the mini -library. However, the minor peak centered on 
pH 4.5 is diminished in the third passage relative to 
the second passage. Furthermore, the percentage of 
input phage which eluted at pH 3.5 is greater in the 
third passage than in the second passage. In 
comparison, the BPTI (K15V, R17L) -III fusion phage elute 
from HNE beads as a peak centered on pH 4.25. Taken 
together, the data suggests that a significant selection 
for fusion phage displaying PEPIs with high affinity for 
HNE occurred. Furthermore, since more extreme pH 

conditions are required to elute fusion phage in the 
third passage of the MYMUT library relative to those 
conditions needed to elute BPTI (K15V, R17L) -III MA phage, 
this suggests that those fusion phage which appear in 
the pH 3.5 fraction may display a PEPI with a higher 
affinity for HNE than the BPTI (K15V, R17L) variant ( i.e. 
Kd < 6-10" 11 M) . 
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d) Characterization of Selected Fusion Phage 

The pH 2.0 fraction from the third passage of the 
mini-library was titered and plaques were obtained on a 
lawn of cells. Twenty plaques were picked at random and 
phage derived from plaques were probed with the CYSB 
oligonucleotide via the Dot Blot Procedure. 
Autoradiography of the filter revealed that all 2 0 
samples gave a positive hybridization signal indicating 
that fusion phage were present and the DNA encoding 
residues 35 to 47 of BPTI (MGNG) is contained within the 
recombinant M13 genomes. Rf DNA was prepared for the 2 0 
clones and initial dideoxy sequencing revealed that 12 
clones were identical. This sequence was designated 
EpiNEof (SEQ ID NO: 45 and SEQ ID NO: 108) (Table 207) . No 
DNA sequence changes were observed apart from the 
planned variegation. Hence the cassette mutagenesis 
procedure preserved the ' context of the planned 
variegation of the pepi gene. The Dot Blot Procedure 
was employed to probe all 2 0 selected clones from the pH 
2.0 fraction from the third passage of the mini-library 
with an oligonucleotide homologous to the sequence of 
EpiNEor. Following high stringency washing, 

autoradiography revealed that all 2 0 selected clones 
were identical in the PI region. Furthermore dot blot 
analysis revealed that of the 2 8 different phage samples 
pooled to create the mini -library, only one contained 
the EpiNEa sequence. Hence in just three passes of the 
mini-library over HNE beads, 1 out of 28 input fusion 
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phage was selected for and appears as a pure population 
in the lowest pH fraction from the third passage of the 
library. That the EpiNEa phage elute at pH 3.5 while 
BPTI (K15V, R17L) -III MA phage elute at a higher pH 
strongly suggests that the EpiNEof protein has a 
significantly higher affinity than BPTI (K15V, R17L) for 
HNE. 

4) Fractionation of the MYMUT Library 
a) Three cycles of enrichment 

The same procedure used above to fractionation the 
mini-library was used to fractionate the entire MYMUT 
PEPI library consisting of fusion phage displaying 1000 
different proteins. The phage inputs for the first, 
second and third rounds of fractionation were 4.0-10 11 , 
5.8-10 10 , and 1.1-10 11 pfu respectively. Figure 8 

illustrates that the largest percentage of input phage 
which bound to the HNE matrix was recovered in the pH 
5.0 wash in the first enrichment cycle. The pH elution 
profile is very similar to that seen for the first 
passage of the mini-library over HNE beads. A trailing 
edge is also observed on the low pH side of the pH 5.0 
peak however this is not as prominent as that observed 
for the mini -library. The percentage of input phage 
which eluted in the pH 7.0 wash was greater than that 
eluted in the pH 6.0 wash. This is in contrast to the 
result obtained for the first passage of the mini 
library and may reflect the presence of «2 0% parental- 
BPTI (MGNG) -III MA phage in the MYMUT library pool. 
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and elute in the pH 7.0 fraction. That no parent phage 
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BPTI variants in the MYMUT library. 

^•f t-Vi<= seauences of tne Jipxw 
An examination of the sequ 

A strong preference for either 
is illuminating. A strong P . indic ated with 
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Xibrary at P ^ ^ ^ ^ met at 
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residua at this position but it only appears when VM. i. 

15 At position 18 PHE was observed 
present at position 15. At pos 

In all 20 clones sequenced even though the MYMUT 
oligonucleotide is capable o£ encoding other residues at 
thi s position. This result is quite surprising and 
could not be predicted from previous mutational analysis 
o£ BPTI, -del building, or on any theoretical ground.. 
„e infer that the presence of PHE at position 18 
significantly enhances the ability each of the EpiNEs to 
bind to HNE . Finally at position 19. PRO appears m 10 

the second most prominent 
of 2 0 codons while SER, «e 

residue, appears at 6 of 20 codons. Of the residues 
targeted for mutagenesis in the present study, residue 
19 is the nearest to the edge of the interaction surface 
o£ a PEPI with HNE . Nevertheless, a preponderance of 
P RO is observed and may indicate that PRO at 19, Ixk. 
PHE at 18, enhances the binding of these proteins to 
interestingly, EpiNEB appears only once and 
di „ers from EpiNEl only at position 19,- similarly, 
E piNE6 differs from EpiNE3 only at position 19. These 
alterations may have only a minor effect on the ability 
of these proteins to interact with HNE. This is 
supported by the fact that the P H elution profile. *« 
EP InE5 and EpiNE6 are very similar to those of EpiNEl 
and EpiNE3 respectively. 

only EpiNE2 and EpiNES exhibit pH profiles whrch 
differ from those of the other selected clones. Both 
clones contain LYS at position 19 which may restrict the 



333 



interaction of BPTI with HNE. However, we can not 
exclude the possibility that other alterations within 
EpiNE2 and EpiNE8 (R15L and Y21S respectively) influence 
their affinity for HNE. 

EpiNE7 was expressed as a soluble protein and 
analyzed for HNE inhibition activity by the f luorometric 
assay of Castillo et al . (CAST79) ; the data were 
analyzed by the method of Green and Work (GREE53) . 
Preliminary results indicate that Kd (HNE , EpiNE7) <; 8.-10' 
12 M, i.e. at least 7.5-fold lower than the lowest Kd 
reported for a BPTI derivative with restect to HNE. 
C . Summary 

Taken together, these data show that the 
alterations which appear in the PI region of the EPI 
mutants confer the ability to bind to HNE and hence be 
selected through the fractionation process. That the 
sequences of EpiNEl , EpiNE3 , and EpiNE7 appear 
frequently in the population of selected clones suggests 
that these clones display BPTI variants with the highest 
affinity for HNE of any of the 1000 potentially 
different variants in the MYMUT library. Furthermore, 
that pH conditions less than 4.0 are required to elute 
these fusion phage from immobilized HNE suggests that 
they display BPTI variants having a higher affinity for 
HNE than BPTI (K15V, R17L) . EpiNE7 exhibits a lower Kd 
toward HNE than does BPTI (K15V, R17L) ; EpiNEl and EpiNE3 
should are also expected to exhibit lower Kas for HNE 
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than BPTI (K15V,R17L) . It is possible that all of the 
listed EpiNEs have lower Kas than BPRI (K15V, R17L) . 

Position 18 has not previously been identified as a 
key position in determining specificity or affinity of 
aprotinin homologues or derivatives for particular 
serine proteases. None have reported or suggested that 
phenylalanine at position 18 will confer specificity and 
high affinity for HNE . One of the powerful advantages 
of the present invention is that many diverse amino-acid 
sequences may be tested simultaneously. 

EXAMPLE V 

SCREENING OF THE MYMUT LIBRARY FOR BINDING TO CATHEPSIN 
G BEADS. 

We fractionated the MYMUT library over immobilized 
human Cathepsin G to find an engineered protease 
inhibitor having high affinity for Cathepsin G, 
hereafter designated as an EpiC. The details of phage 
binding, elution of bound phage with buffers of 
decreasing pH (pH profile) , titering of the phage 
contained in these fractions, composition of the MYMUT 
library, and the preparation of cathepsin G (Cat G) 
beads are essentially the same as detailed in Example 
IV. 

A pH profile for the binding of two starting 
controls, BPTI -III MK and EpiNEl, are shown in Figure 
10. BPTI-III MK phage, which contains wild type BPTI 
fused to the III gene product, shows no apparent binding 
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to Cat G beads in this assay. EpiNEl phage was obtained 
by enrichment with HNE beads (Example IV and Table 208) . 
EpiNEl-III MK demonstrated little binding to Cat G beads 
in the assay, although a small peak or shoulder is 
visible in the pH 5 eluted fraction. 

Figure 11 shows the pH profiles of the MYMUT 
library phage when bound to Cat G beads. Library- Cat G 
interaction was monitored using three cycles of binding, 
pH elution, transduction of the pH 2 eluted phage, 
growth of the transduced phage and rebinding of any 
selected phage to Cat G beads, in an exact copy of that 
used to find variants of BPTI which bound to HNE. In 
contrast to the pH profiles elicited with HNE beads, 
little enhancement of binding was observed for the same 
phage library when cycled with Cat G beads (with the 
exception of a possible 1 shoulder* developing in the pH5 
elutions) . 

To investigate the elution profile around the pH 5 
point in more detail, the binding of phage taken from 
the pH 4 eluted fraction (bound to Cat G beads) rather 
than the previously used pH 2 fraction was examined. 
Figure 12 demonstrates a marked enhancement of phage 
binding to the Cat G beads with an apparent elution peak 
of pH 5. The binding, as a fraction of the input phage 
population, increased with subsequent binding and 
elution cycles. 

Individual phage clones were picked, grown and 
analyzed for binding to Cat G beads. Figure 13 shows 
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the binding and pH profiles for the individual Cat G 
binding clones (designated EpiC variants) . All clones 
exhibited minor peaks, superimposed upon a gradual fall 
in bound phage, at pH elutions of 5 (clones 1 (SEQ ID 
NOs:54 and 117), 8 (SEQ ID NOs : 56 and 119), 10 (SEQ ID 
NOs:57 and 120) and 11 (SEQ ID NOs : 54 and 117)) or pH 
4.5 (clone 7 (SEQ ID NOs:55 and 118)). 

DNA sequencing of the EpiC clones, shown in Table 
209 (SEQ ID NOs: 54 through 58 and 117 through 121), 
demonstrated that the clones selected for binding to Cat 
G beads represented a distinct subset of the available 
sequences in the MYMUT library and a cluster of 
sequences different from that obtained when enriched 
with HNE beads. The PI residue in the EpiC mutants is 
predominantly MET, with one example of PHE, while in 
BPTI it is LYS and in the EpiNE variants it is either 
VAL or LEU. In the EpiC mutants residue 16 is 

predominantly ALA with one example of GLY and residue 17 
is PHE, ILE or LEU. Interestingly residues 16 and 17 
appear to pair off by complementary size, at least in 
this small sample. The small GLY residue pairs with the 
bulky PHE while the relatively larger ALA residue pairs 
with the less bulky LEU and ILE. The majority of the 
available residues in the MYMUT library for positions 18 
and 19 are represented in the EpiC variants. 

Hence, a distinct subset of related sequences from 
the MYMUT library have been selected for and 
demonstrated to bind to Cat G. A comparison of the pH 
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profiles elicited for the Epic variants with Cat G and 
the EpiNE variants for HNE indicates that the EpiNE 
variants have a high affinity for HNE while the Epic 
variants have a moderate affinity for Cat G. 
Nonetheless, the starting molecule, BPTI, has virtually 
no detectable affinity for Cat G and the selection of 
clones with a moderate affinity is a significant 
finding . 

EXAMPLE VI 

SECOND ROUND OF VARIEGATION OF EpiNE 7 TO ENHANCE BINDING 
TO HNE 

A. MUTAGENESIS OF EpiNE 7 PROTEIN IN THE LOOP 
COMPRISING RESIDUES 34-41 

In Example IV, we described engineered protease 
inhibitors EpiNE 1 through EpiNE8 (SEQ ID NOs:46 through 
53 and 109-116) that were obtained by affinity- 
select ion. Modeling of the structure of the BPTI- 
Trypsin complex (Brookhaven Protein Data Bank entry 
1TPA) indicates that the EpiNE protein surface that 
interacts with HNE is formed not only by residues 15-19 
but also by residues 34-40 that are brought close to 
this primary loop when the protein folds (HUBE74 , 
HUBE75, OAST88) . Acting upon this assumption, we 

changed amino acid residues in a second loop of the 
EpiNE7 protein to find EpiNE7 (SEQ ID NO: 48) derivatives 
having higher affinity for HNE. 
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In the complex of BPTI and trypsin found in 
Brookhaven Protein Data Bank entry 1TPA ("1TPA 
complex"), VAL34 contacts TYR151 and GLN i 92 . (Residues in 
trypsin or HNE are underscored to distinguish them from 
the inhibitor.) In HNE, the corresponding residues are 
ILE isi and PHE i 92 . ILE is smaller and more hydrophobic 
than TYR. PHE is larger and more hydrophobic than GLN. 
Neither of the HNE side groups have the possibility to 
form hydrogen bonds. When side groups larger than that 
of VAL are substituted at position 34, interactions with 
residues other than 151 and 192 may be possible. In 
particular, an acidic residue at 34 might interact with 
ARG147 of HNE that corresponds to SER147 of trypsin in 
1TPA. Table 15 shows that, in 5 9 homologues of BPTI, 13 
different amino acids have been seen at position 34. 
Thus we allow all twenty amino acids at 34. 

Position 36 is not highly varied; only GLY, SER, 
and ARG have been observed with GLY by far the most 
prevalent. In the 1TPA complex, GLY 36 contacts HIS 57 and 
GLN192 . HIS57 is conserved and GLN192 corresponds to PHE 192 
of HNE. Adding a methyl group to GLY 36 could increase 
hydrophobic interactions with PHE 192 of HNE. GLY 36 is in 
a conformation that most amino acids can achieve: <f> 
= -79° and \p = -9° (Deisenhof f er cited in CREI84, 
p. 222 . ) . 

In the 1TPA complex, ARG 39 contacts SER 96 , ASN 97 , 
THR 98 , LEU 99 (SEQ ID N0:13), GLN175, and TRP215 . In HNE, 
all of the corresponding residues are different! SER 96 
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is deleted; ASN 97 corresponds to ASP 97 (bearing a negative 
charge) ; THR 98 corresponds to PR0 98 ; LEU 99 corresponds to 
the residues VAL 99/ ASN 99a , and LEU 99b ; GLN 175 is deleted; 
and TRP 215 corresponds to PHE215. Position 3 9 shows a 
moderately high degree of variability with 7 different 
amino acids observed, viz. ARG, GLY, LYS, GLN, ASP, PRO, 
and MET. Having seen PRO (the most rigid amino acid) , 
GLY (the most flexible amino acid) , LYS and ASP (basic 
and acidic amino acids) , we assume that all amino acids 
are structurally compatible with the aprotinin backbone. 
Because the context of residue 3 9 has changed so much, 
we allow all 20 amino acids. 

Position 4 0 is not highly variable; only GLY and 
ALA have been observed (with similar frequency, 24:16). 
Position 41 is moderately varied, showing ASN, LYS, ASP, 
GLN, HIS, GLU, and TYR. The side groups of residues 40 
and 41 are not thought to contact trypsin in the 1TPA 
complex. Nevertheless, these residues can exert 

electrostatic effects and can influence the dynamic 
properties of residues 39, 38, and others. The choice 
of residues 34, 36, 39, 40, and 41 to be varied 
simultaneously illustrates the rule that the varied 
residues should be able to touch one molecule of the 
target material at one time or be able to influence 
residues that touch the target. These residues are not 
contiguous in sequence, nor are they contiguous on the 
surface of EpiNE7 . They can, nonetheless, all influence 
the contacts between the EpiNE and HNE. 
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Amino acid residues VAL 34/ GLY 36/ MET 39 , GLY 40 , and 
ASN41 were variegated as follows: any of 20 genetically 
encodable amino acids at positions 34 and 3 9 (NNS codons 
in which N is approximately equimolar A,C,T,G and S is 
approximately equimolar C and G) , GLY or ALA at position 
36 and 40 (GST codon) , and [ASP, GLU, HIS, LYS, ASN, 
GLN, TYR, or stop] at position 41 (NAS codon) . Because 
the PEPIs are displayed fused to gill protein, DNA 
containing stop codons will not give rise to infectuous 
phage in non- suppressor hosts. 

For cassette mutagenesis, a 61 base long 
oligonucleotide DNA population was synthesized that 
contained 32,768 different DNA sequences coding on 
expression for a total of 11,200 amino acid sequences. 
This oligonucleotide extends from the third base of 
codon 51 in Table 113 (the middle of the StuI site) to 
base 2 of codon 70 (the EagI site (identified as Xmalll 
in Table 113) ) . 

We used a mutagenesis method similar to that 
described by Cwirla et al . (CWIR90) and other standard 
DNA manipulations described in Maniatis et al . (MANI82) 
and Sambrook et al . (SAMB8 9) . EpiNE7 RF DNA was 

restricted with Eag I and Stu I , agarose gel purified, and 
dephosphorylated using HK (TM> phosphatase (Epicentre 
Technologies) . We prepared insert by annealing two 
small, 16 base and 17 base, phosphorylated synthetic DNA 
primers to the phosphorylated 61 base long 
oligonucleotide population described above. The 
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resulting insert DNA population had the following 
features : double stranded DNA ends capable of 

regenerating upon ligation the EagI (5* overhang) and 
StuI (blunt) restricted sites of the EpiNE7 RF DNA, and 
single stranded DNA in the central mutagenic region. 
Insert and EpiNE7 vector DNA were ligated. Ligation 
samples were used to transfect competent XLl-Blue (TM) 
cells which were subsequently plated for formation of 
ampicillin resistant (Ap R ) colonies. The resulting 
phage -producing, Ap R colonies were harvested and 
recombinant phage was isolated. By following these 
procedures, a phage library of 1.2 -10 5 independent 
transf ormants was assembled. We estimated that 97.4% of 
the approximately 3.3 ■ 10 4 possible DNA sequences were 
represented : 

0.974 = (1 - exp{-1.2-10 5 /32768}) . 
The probability of observing the parental sequence is 
higher than .974 because VAL occurs twice in the NNS 
codon : 

Probability of seeing (V 34 , G 36 , M 39 , G 40 , N 4X ) = 
(1 - exp{ - (1.2-10 5 x 2/32768) } 
= (1 - exp{ - 7.32}) 
= (1 - 6.5-10" 4 ) 
= 0.99934 

Furthermore, we expect that a small amount (for example, 
1 part in 1000) of uncut or once-cut and religated 
parental vector would come through the procedures used. 
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Thus the parental sequence is almost certainly present 
in the library. This library is designated the KLMUT 
library. 

B. AFFINITY SELECTION WITH IMMOBILIZED HUMAN 

NEUTROPHIL ELASTASE 

1) First Fractionation 

We added 1.1-10 8 plaque forming units of the KLMUT 
library to 10 til of a 50% slurry of agarose- immobilized 
human neutrophil elastase beads (HNE from Calbiochem 
cross-linked to Reacti-Gel (TM) agarose beads from Pierce 
Chemical Co. following manufacturer's directions) in 
TBS/BSA. Following 3 hours incubation at room tempera 
ture, the beads were washed and phage was eluted as done 
in the selection of EpiNE phage isolates (Example IV) . 
The progression in lowering pH during the elution was: 
pH 7.0, 6.0, 5.0, 4.5, 4.0, 3.5, 3.0, 2.5, and 2.0. 
Beads carrying phage remaining after pH 2 . 0 elution were 
used to infect XLl-Blue (TM) cells that were plated to 
allow plaque formation. The 348 resulting plaques were 
pooled to form a phage population for further affinity 
selection. A population of phage particles containing 
6.0-10 8 plaque forming units was added to 10 fxl of a 50% 
slurry of agarose- immobilized HNE beads in TBS/BSA and 
the above selection procedure was repeated. 

Following this second round of affinity selection, 
a portion of the beads was mixed wi th XLl-Blue (TM) cells 
and plated to allow plaque formation. Of the resulting 
plaques, 480 were pooled to form a phage population for 
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of RF DNA. DNA sequencing yielded the amino acid 
sequence in the mutated secondary loop for 2 0 EpiNE7 
homolog clones. These sequences, together with EpiNE7 
(SEQ ID NO:48) , are given in Table 210 as EpiNE7.21 
through EpiNE7.4 0 (SEQ ID NOs:71 through 87). The 
plaques observed when EpiNEs are plated display a 
variety of sizes. EpiNE7.21 through EpiNE7.30 (SEQ ID 
NOs:71 through 80) were picked with attention to plaque 
size: 7.21, 7.22, and 7.23 from small plaques, 7.24 
through 7.30 from plaques of increasing size, with 7.30 
coming from a large plaque. TRP occurs at position 3 9 
in EpiNE7.21, 7.22, 7.23, 7.25, and 7.30. Thus plaque 
size does not correlate with the appearance of TRP at 
39. One sequence, EpiNE7.31, from this fractionation is 
identical to sequences EpiNE7 . 8 and EpiNE7 . 9 obtained in 
the first fractionation. EpiNE7.30, EpiNE7.34, and 

EpiNE7.3 5 are identical, indicating that the diversity 
of the library has been greatly reduced. It is believed 
that these sequences have an affinity for HNE that is at 
least comparable to that of EpiNE7 and probably higher. 
Because the parental EpiNE7 sequence did not recur, it 
is quite likely that some or all of the EpiNE7.nn 
derivatives have higher affinity for HNE than does 
EpiNE7 . 

3) Conclusions 

One can draw some conclusions. First, because some 
sequences have been isolated repeatedly, the 
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fractionation is nearly complete. The diversity has 
been reduced from slO 4 to a few tens of sequences. 

Second, the parental sequence has not recurred. At 
39, MET did not occur! At position 34 VAL occurred only 
once in 3 5 sequences. At 41, ASN occurred only 4 of 3 5 
times. At 40, GLY occurred 17 of 35 times. At position 
36, GLY occurred 34 of 35 times, indicating that ALA is 
undesirable here. EpiNE7.24 (SEQ ID NO: 74) and 

EpiNE7.36 (SEQ ID NO: 83) are most like EpiNE7 (SEQ ID 
NO:48), having three of the varied residues identical to 
EpiNE7 . 

Third, the results of the first and second 
fractionation are similar. In the second fractionation, 
the prevalence of TRP at position 3 9 is more marked 
(5/15 in fractionation #1, 14/20 in #2) . It is possible 
that the first fractionation lost some high-affinity 
EPIs through under- sampl ing . Nevertheless, the first 
fractionation was clearly quite successful. 

Fourth, there are strong preferences at positions 
39 and 36 and lesser but significant preferences at 
positions 34 and 41 with little preference at 40. 

Heretofore, no homologues of aprotinin have been 
reported having ALA at 36. In the selected EpiNE7.nn 
sequences, the preference for GLY over ALA at position 
36 is 34:1. This preference is probably not due to 
differences in protein stability. The process of the 
present invention, as applied in the present example, 
does not select against proteins on the basis of 
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stability so long as the protein does fold and function 
at the temperature used in the procedure. ALA is 
probably tolerated at position 36 well enough to allow 
those proteins having ALA 36 to fold and function; one 
example was found having ALA 36 . It may be relevant that 
the sole sequence having ALA 36 also has GLY 34 , The 
flexibility of GLY at 34 may allow the methyl of ALA at 
36 to fit into HNE in a way that is not possible when 
other amino acids occupy position 34 . 

At position 39, all 20 amino acids were allowed, 
but only seven were seen. TRP is strongly preferred 
with 19 occurrences, HIS second with six occurences, and 
LEU third with 5 occurrences . No homologues of 

aprotinin have been reported having either TRP or HIS at 
position 39 as are now disclosed. Although LEU is 
represented in the NNS codon thrice, TRP and HIS have 
but one codon each and their prevalence is surprising . 
We constructed a model having HNE (Brookhaven Protein 
Data Bank entry 1HNE) and EpiNE7 . 9 (SEQ ID NO: 60) 
spatially related as in the 1TPA complex. (The a 

carbons of HNE of conserved internal residues were 
superimposed on the corresponding a carbons of trypsin, 
rms deviation ~0 . 5 A.) Inspection of this model 

indicates that TRP 39 could interact with the loop of HNE 
that comprises VAL 99 , ASN 99a , and LEU 99b . HIS is observed 
in six cases; HIS is hydrophobic, aromatic, and in some 
ways similar to TRP. LEU 39 in EpiNE7 . 5 could also 
interact with these residues if the loop moves a short 
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GLU, GLY, or MET at 34 has been reported heretofore. 
Here, as at position 39, the library contains an excess 
of LEU over LYS and GLU. Thus, we infer that the 
prevalence of LYS, GLU, THR, and LEU is related to 
tighter binding of EpiNEs having these amino acids at 
position 34. The prevalence of LYS is surprising, as 
there are no acidic groups on HNE in the neighborhood. 
The N Z eta of LYS 34 could interact with a main- chain 
carbonyl oxygen while the methylene groups interact with 
ILEisi and/or PHE 192 - LEU 34 could interact with ILE151 
and/or PHEi 92 while GLU 34 could interact with ARG 14 7. 

There has been little if any enrichment at 
positions 40 and 41. Alanine is somewhat preferred at 
40; ALA: GLY: : 18 : 17 . Both ALA and GLY have been reported 
in aprotinin homologues. 

Position 41 shows a preponderance of LYS (12 
occurrences) and GLU (7) , but all eight possibilities 
have been seen. The overall distribution is LYS 12 , GLU 7 , 
ASP 4 , ASN 4 , GLN 3 , HIS 3 , and TYR 2 . Heretofore, no 

homologues of aprotinin having GLU, GLN, HIS, or TYR at 
position 41 have been reported. 

One sequence, EpiNE7.2 5 (SEQ ID NO: 75) contains an 
unexpected change at position 47, SER to LEU. 
Heretofore, all homologues of aprotinin reported have 
had either SER or THR at position 47. The side groups 
of SER and THR can form hydrogen bonds to main-chain 
atoms at the beginning of the short a helix. 
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The consensus sequence, LYS 34 , GLY 36 , TRP 39 , ALA 40 , 
LYS41 was not observed. Epi N E7.23 (SEQ ID NO:73) is 
quite close, differing only at position 40 where the 
preference for ALA is very, very weak. 

We tested EpiNE7.23 (the sequence closest to 
consensus) against EpiNE7 (SEQ ID NO-.48) on HNE beads. 
Figure 16 shows the fractionation of strains of phage 
that display these two EpiNEs . Phage that display 
EpiNE7 are eluted at higher pH than are phage that 
display E P iNE7.23. Furthermore, more of the EpiNE7.23 
phage are retained than of the EpiNE7 phage. Note the 
peak at pH 2.25 in the EpiNE7.23 elution. This suggests 
that EpiNE7.23 has a higher affinity for HNE than does 
EpiNE7 . in a similar way, we tested EpiNE7.4 (SEQ ID 
NO -63) and found that it is not retained on HNE so well 
as EpiNE7 . This is consistent with the f ractionat xon 

not being complete. 

Further fractionation, characterization of clonally 

pure E P iNE7.nn strains, and biochemical characterization 

of soluble EpiNE7.nn derivatives will reveal which 

sequences in this collection have the highest affinity 

for HNE. 

Fractionation of the library involves a number of 
factors. Differential binding allows phage that display 
PBDs having the desired binding properties to be 
enriched. Differences in infectivity, plaque size, and 
phage yield are related to differences in the sequence 
of the PBDs, but are not directly correlated to affinity 



351 



for the target . These factors may reduce the 

effectiveness of the desired fractionation. An 
additional factor that may be present is differential 
abundance of PBD sequences in the initial library. One 
step we employ to reduce the effect of differential 
infectivity is to transduce cells with isolated phage 
rather than to infect them. In the first fractionation, 
we did not obtain sufficient material for transduction 
and so infected cells; this fractionation was 
successful. Because the parental sequence, EpiNE7, was 
selected for a sequence at residues 15 through 19 that 
confer high affinity for HNE, we believe that many, if 
not most, members of the KLMUT population have 
significant affinity for HNE. Thus the present 

fractionations must separate variants having very high 
affinity for HNE from those merely having high affinity 
for HNE. It is perhaps relevant that BPTI-III MK phage 
are only partially eluted from immobilized trypsin at pH 
2.2.; Ka (trypsin, BPTI) = 6.0-10" 14 M. Elution of EpiNE7 - 
III MA phage from immobilized HNE gives a peak at about 
pH 3.5 with some phage appearing at lower pH; 
Ka (HNE, EpiNE7) <; 1.-10" 11 M. We recycled phage that 
either were eluted at pH 2.0 or that were retained after 
elution with pH 2.0 buffer. A large percentage of 
EpiNE7-III MA phage would have been washed away with the 
fractions at pHs less acid than 2.0. This, together 
with the marked preferences at positons 39, 36, and 34, 
strongly sugestes that we have successfully fractionated 
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substitutions at several locations may be tested with an 
amount of effort not much greater than is required to 
test a single derivative by previously used methods. 

There exist a number of proteases produced by 
lymphocytes. Neutrophil elastase is not the only 

lymphocytic protease that degrades elastin. The 
protease p29 is related to HNE . Screening the MYMUT and 
KLMUT libraries against immobilized p2 9 is likely to 
allow isolation of an aprotinin derivative having high 
affinity for p29. 

EXAMPLE VII 
BPTI: VIII BOUNDARY EXTENSIONS. 

The aim of this work was to introduce peptide 
extensions between the C-terminus of the BPTI domain and 
the N- terminus of the M13 major coat protein within the 
fusion protein. The reasons for this were two fold; 
firstly to alter potential protease cleavage sites at 
the interdomain boundary (as evidenced by an apparent 
instability of the fusion protein) and secondly to 
increase interdomain flexibility. 

1) Insertion of a variegated pentapeptide at the 
BPTI : VIII interface . 

The gene shown in Table 113 was modified by 
insertion of five RVT codons between codon 81 and 82. 
Two synthetic oligonucleotides were designed and custom 
synthesized. The first consisted of, from 5' to 3 1 : a) 
from base 2 of codon 77 to the end of codon 81, b) five 
copies of RVT, and c) from codon 82 to the second base 



354 



of codon 94. The second comprised 20 bases 

complementary to the 3 ' end of the first 
oligonucleotide. Each RVT codon allows one of the amino 
acids [T, N, S, A, D, and G] to be encoded. This 
variegation codon was picked because: a) each amino acid 
occurs once, and b) all these amino acids are thought to 
foster a flexible linker. When annealed, the primed 
variegated oligonucleotide was converted to double- 
stranded DNA using standard methods. 

The duplex was digested with restriction enzymes 
Sf i l and Narl and the resulting 4 5 base -pair fragment 
was ligated into a similarly cleaved OCV, M13MB4 8 
(Example I.l.iii.a). The ligated material was 

transfected into competent E_^ coli cells (strain XL1- 
Blue (TM) ) and plated onto a lawn of the same cells on 
normal bacterial growth plates to form plaques. The 
bacteriophage contained within the plaques were analyzed 
using standard methods of nitrocellulose lifts and 
probing using a 32 P-labeled oligonucleotide complementary 
to the DNA sequence encoding the fusion protein 
interface. Approximately 80% of the plaques probed 
poorly with this oligonucleotide and hence contained new 
sequences at this position. 

A pool of phages, containing the novel interface 
pentapeptide extensions, was collected by combining the 
phage extracted from the plated plaques. 
2 . Adding multiple unit extensions to the fusion 
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protein interface. 

The M13 gene III product contains -stalk-like' 
regions as implied by electron micrographic 
visualization of the bacteriophage (LOPE85) . The 
predicted amino acid sequence of this protein contains 
repeating motifs, which include: 

glu.gly.gly.gly-ser (EGGGS) (SEQ ID NO:10) seven times 
gly.gly.gly.ser (GGGS) (SEQIDNO:14) three times 
glu.gly.gly.glY-thr (EGGGT) (SEQ ID NO:15) once. 

The aim of this section was to insert, at the 
domain interface, multiple unit extensions which would 
mirror the repeating motifs observed in the III gene 
product . 

Two synthetic oligonucleotides were designed and 

^ pt v i c, encoded by four codons 

custom synthesized. GLY is encoaeu y 

(GGN) ; when translated in the opposite direction, these 
codons give rise to THR, PRO, ALA, and SER. The third 
base of these codons was picked so that translation of 
the oligonucleotide in the opposite direction would 
encode SER. When annealed the synthetic 

oligonucleotides give the following unit duplex sequence 

(an EGGGS linker) : 

EG GGS (SEQ ID NO: 10) 
5- C GAG.GGA.GGA.GGA.TC 3' (SEQ ID NO:100) 

3. TC CCT CCT.CCT.AGG.C 5" (SEQ ID NO-.101) 

(L) (S) (S) (S) (G) (SEQ ID NO:^ 

The duplex has a common two base paTTs < overhang 
(GO at either end of the linker which allows for both 
the ligation of multiple units and the ability to clone 
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into the unique Narl recognition sequence present in 
OCV's M13MB48 and Gem MB42 . This site is positioned 
within 1 codon of the DNA encoding the interface. The 
cloning of an EGGGS linker (SEQ ID NO: 10) (or multiple 
linker) into the vector Nar l site destroys this 
recognition sequence. Insertion of the EGGGS linker in 
reverse orientation leads to insertion of GSSSL. (SEQ ID 
NO: 16) into the fusion protein. 

Addition of a single EGGGS linker at the Nar l site 
of the gene shown in Table 113 leads to the following 
gene : 

79 80 80a 80b 80c 80d 80e 81 82 83 84 
GGEGGGS-AAEG (SEQ ID NO:17) 

GGT . GGC . GAG . GGA . GGA . GGA . TCC . GCC . GCT . GAA . GGT (SEQ ID NO:102) 



Note that there is no preselection for the 
orientation of the linker (s) inserted into the OCV and 
that multiple linkers of either orientation (with the 
predicted EGGGS or GSSSL amino acid sequence) or a 
mixture of orientations (inverted repeats of DNA) could 
occur . 

A ladder of increasingly large multiple linkers was 
established by annealing and ligating the two starting 
oligonucleotides containing different proportions of 5 ! 
phosphorylated and non-phosphorylated ends. The logic 
behind this is that ligation proceeds from the 3 ■ 
unphosphorylated end of an oligonucleotide to the 5 1 
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phosphorylated end of another. The use of a mixture of 
phosphorylated and non-phosphorylated oligonucleotides 
allows for an element of control over the extent of 
multiple linker formation. A ladder showing a range of 
insert sizes was readily detected by agarose gel 
electrophoresis spanning 15 bp (1 unit duplex- 5 amino 
acids) to greater than 600 base pairs (40 ligated 
linkers -2 00 amino acids) . 

Large inverted repeats can lead to genetic 
instability. Thus we chose to remove them, prior to 
ligation into the OCV, by digesting the population of 
multiple linkers with the restriction enzymes Acclll or 
Xho l , since the linkers, when ligated ' head- to-head 1 or 
1 tail-to-tail ' , generate these recognition sequences. 
Such a digestion significantly reduces the range in 
sizes of the multiple linkers to between 1 and .8 linker 
units ( i.e. between 5 and 40 amino acids in steps pf 5) , 
as assessed by agarose gel electrophoresis. 

} The linkers were ligated (as a pool of different 
insert sizes or as gel-purified discrete fragments) into 
Narl cleaved OCVs M13MB4 8 or GemMB42 using standard 
methods. Following ligation the restriction enzyme Nar l 
was added to remove the self - ligat ing starting OCV 
(since linker insertion destroys the Nar l recognition 
sequence) . This mixture was used to transform competent 
XL-1 blue cells and appropriately plated for plaques 
(OCV M13MB48) or ampicillin resistant colonies (OCV 
GemMB42) . 
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The transf ormants were screened using dot blot DNA 
analysis with one of two 32 P labeled oligonucleotide 
probes. One probe consisted of a sequence complementary 
to the DNA encoding the PI loop of BPTI while the second 
had a sequence complementary to the DNA encoding the 
domain interface region. Suitable linker candidates 
would probe positively with the first probe and 
negatively or poorly with the second. Plaque purified 
clones were used to generate phage stocks for binding 
analyses and BPTI display while the Rf DNA derived from 
phage infected bacterial cells was used for restriction 
enzyme analysis and sequencing. Representative insert 
sequences of selected clones analyzed are as follows: 

M13 .3X4 (GG) C.GGA.TCC.TCC.TCC.CT (C.GCC) (SEQ 



ID NO:103) 



ser ser ser leu (AA 6-10 of 



SEQ ID NO: 11^ 



M13 . 3X7 
ID NO: 104) 



(G C . GAG . GGA . GGA . GGA . TC ( C . GCC ) (SEQ 



glu gly gly gly ser (SEQ ID 



NO : 1 0 ) 



Ml 3 . 3X11 

*>eq H> ajq rag** 



(GG) C . GAG . GGA . GGA . GGA . TCC . GGA . TCC . TCC . 
glu gly gly gly ser gly ser ser 



TCC . CTC . GGA . TCC . TCC . TCC . CT ( C . GCCC ) 



(SEQ ID NO: 105) 



ser leu gly ser ser ser leu (SEQ 



ID NO: 18) 



These highly flexible oligomeric linkers are believed to 
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be useful in joining a binding domain to the major coat 
(gene VIII) protein of filamentous phage to facilitate 
the display of the binding domain on the phage surface. 
They may also be useful in the construction of chimeric 
OSPs for other genetic packages as well . 

EXAMPLE VIII 
BACTERIAL EXPRESSION VECTORS. 

The expression vectors were designed for the bac 
terial production of BPTI analogues resulting from the 
mutagenesis and screening for variants with specific 
binding properties. The expression vectors used are 
derivatives of the OCV ! s M13MB48 and GemMB42 . The 
conversion was achieved by replacing the first codon of 
the mature VIII gene (codon 82 as shown in Table 113) 
with a translational stop codon by site specific 
mutagenesis. 

The salient points of the expression vector 
composition are identical to that of the parent OCV's, 
namely a lacUVS promoter (hence IPTG induction) , 
ribosome binding site, initiating methionine, pho A 
signal peptide and transcriptional termination signal 
(see Table 113). The placement of the stop codon allows 
for the expression of only the first half the fusion 
protein. The Gem-based expression system, containing 
the genes encoding BPTI analogues , is stored as plasmid 
DNA, being freshly transfected into cells for expression 
of the analogue protein. The M13 -based expression 
system is stored as both RF DNA and as phage stocks. 
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contained within the intergenic region and its 
transcription is under the control of a lacUVS promoter, 
hence IPTG inducible. The expression vector, containing 
the gene of interest, is maintained and utilized as a 
phage stock. This method enables a potentially lethal 
or deleterious gene to be supplied to a bacterial 
culture and gene induction to occur only when the 
bacterial culture has achieved sufficient mass. Poor 
growth and insert instability can be circumvented to a 
large extent, giving this system an advantage over the 
Gem-based vector described above. 

An overnight bacterial culture of XLl-Blue (TM) or 
SEF 1 is grown in LB medium containing tetracycline (50 
/xg per ml) to ensure the presence of pili as sites for 
bacteriophage binding and infection. This culture is 
diluted 100-fold into NZCYM , medium containing 
tetracycline and bacterial growth allowed to proceed in 
an incubator shaker until a cell density of 1.0 (Ab 
600nm) has been achieved. Phage, containing the 

expression vector and gene of interest, are added to the 
bacterial culture at a multiplicity of infection (MOI) 
of 10 and allowed to infect the cells for 30 minutes. 
Gene expression is then induced by the addition of IPTG 
to a final concentration of 0.5 mM and the culture 
allowed to grow overnight. Media collection and cell 
fractionation is as described elsewhere . 
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Bacterial Cell Fractionation. 

After heterologous gene expression the bacterial 
cell culture can be separated into the following 
fractions: conditioned medium, periplasmic fraction and 
post-periplasmic cell lysate. This is achieved using 
the following procedures. 

The culture is centrifuged to pellet the bacteria, 
allowing the supernatant to be stored as conditioned 
medium. This fraction contains any exported proteins. 
The pellet is taken up in 20% sucrose, 30mM Tris pH 8 
and ImM EDTA (80 ml of buffer per gram of fresh weight 
pellet) and allowed to sit at room temperature for 10 
minutes. The cells are repelleted and taken up in the 
same volume of ice cold 5mM MgS0 4 and left on ice for 10 
minutes. Following centrif ugat ion, to pellet the cells, 
the supernatant (periplasmic fraction) is stored. A 
second round of osmotic shock fractionation can be 
undertaken if desired. 

The post-periplasmic pellet can be further lysed as 
follows. The pellet is resuspended in 1.5 ml of 20% 
sucrose, 4 0 mM Tris pH 8, 5 0mM EDTA and 2.5 mg of 
lysozyme (per gram fresh weight of starting pellet) . 
After 15 minutes at room temperature 1.15 ml of 0.1% 
Triton X is added together with 300 /xl of 5M NaCl and 
incubated for a further 15 minutes. 2.5 ml of 0.2 M 
triethanolamine (pH 7.8), 150 ^1 of 1M CaCl 2/ 100 ^1 of 
1M MgCl 2 and 5 of DNA'se are added and allowed to 

incubate, with end-over-end mixing, for 2 0 minutes to 
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reduce viscosity. This is followed by centrif ugation 
with the supernatant being retained as the post- 
periplasmic lysate . 

The present invention is not , of course , limited to 
any particular expression system, whether bacterial or 
not . 

EXAMPLE XX 

CONSTRUCTION OF AN ITI -DOMAIN I /GENE III DISPLAY VECTOR 

1 . ITI domain I as an IPBD 

Inter-af-trypsin inhibitor (ITI) is a large (M r ca 
240,000) circulating protease inhibitor found in the 
plasma of many mammalian species (for recent reviews see 
ODOM90, SALI90, GEBH90 , GEBH86) . The intact inhibitor 
is a glycoprotein and is currently believed to consist 
of three glycosylated subunits that interact through a 
strong glycosaminoglycan linkage (ODOM90, SALI90, 
ENGH89, SELL.87) . The anti-trypsin activity of ITI is 
located on the smallest subunit (ITI light chain, 
unglycosylated M r ca 15,000) which is identical in amino 
acid sequence to an acid stable inhibitor found in urine 
(UTI) and serum (STI) (GEBH8 6, GEBH90) . The mature 
light chain consists of a 21 residue N-terminal 
sequence, glycosylated at SER 10 / followed by two tandem 
Kunitz-type domains the first of which is glycosylated 
at ASN 45 (ODOM90) . In the human protein, the second 
Kunitz-type domain has been shown to inhibit trypsin, 
chymotrypsin, and plasmin (ALBR83a , ALBR83b, SELL87, 
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SWAI88) . The first domain lacks these activities but 
has been reported to inhibit leukocyte elastase (10~ 6 > Ki 
> 1(T 9 ) (ALBR83a , b, ODOM90) . cDNA encoding the ITI light 
chain also codes for a-1- microglobulin (TRAB86 , KAUM86, 
DIAR90) ; the proteins are separated post-translationally 
by proteolysis. 

The N-terminal Kunitz-type of the ITI light chain 
(ITI-D1, comprising residues 22 to 76 of the UTI 
sequence shown in Fig. 1 of GEBH86) possesses a number 
of characteristics that make it useful as an IPBD. The 
domain is highly homologous to both BPTI and the EpiNE 
series of proteins described elsewhere in the present 
application. Although an x-ray structure of the 

isolated domain is not available, crystal lographic 
studies of the related Kunitz-type domain isolated from 
the Alzheimer's amyloid S-protein (AASP) precursor show 
that this polypeptide assumes a crystal structure almost 
identical to that of BPTI (HYNE90) . Thus, it is likely 
that the solution structure of the isolated ITI- Dl 
polypeptide will be highly similar to the structures of 
BPTI and AASP. In this case, the advantages described 
previously for use of BPTI as an IPBD apply to ITI-D1. 
ITI-D1 provides additional advantages as an IDBP for the 
development of specific anti-elastase inhibitory 
activity. First, this domain has been reported to 
inhibit both leukocyte elastase (ALBR83a,b, ODOM90) and 
Cathepsin-G (SWAI88, ODOM90) ; activities which BPTI 
lacks. Second, ITI-D1 lacks affinity for the related 
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serine proteases trypsin, chymotrypsin, and plasmin 
(ALBR83a,b, SWAI88) , an advantage for the development of 
specificity in inhibition. Finally, ITI-D1 is a human- 
derived polypeptide so derivatives, are anticipated to 
show minimal antigenicity in clinical applications. 
2 . Construction of the display vector. 

For purposes of this discussion, numbering of the 
nucleic acid sequence for the ITI light chain gene is 
that of TRAB86 and of the amino acid sequence is that 
shown for UTI in Fig. 1 of GEBH86 . DNA manipulations 
were conducted according to standard methods as 
described in SAMB8 9 and AUSU8 7 . 

The protein sequence of human ITI -Dl consists of 56 
amino acid residues extending from LYS22 to ARG 77 of the 
complete ITI light chain sequence. This sequence is 
encoded by the 168 bases between positions 750 and 917 
in the cDNA sequence presented in TRAB86. The majority 
of the domain is contained between a Bgl l site spanning 
bases 663 to 773 and a Pst I site spanning bases 903 to 
908. The insertion of the ITI-D1 sequence into M13 gene 
III was conducted in two steps. First a linker 

containing the appropriate ITI sequences outside the 
central Bgl l to Pst I region was ligated into the Nar l 
site of phage MA RF DNA. In the second step, the 
remainder of the ITI-D1 sequence was incorporated into 
the linker-bearing phage RF DNA. 

The linker DNA consisted of two synthetic 
oligonucleotides (top and bottom strands) which, when 
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annealed, produced a 54 bp double -stranded fragment with 
the following structure (5 ! to 3 1 ) : 

NAR I OVERHANG/ ITI- 5 ' / BGL I /STUFFER/ PST I /ITI - 3 » / NAR I 

OVERHANG 

The Narl OVERHANG sequences provide compatible ends 
for ligation into a cut Nar l site. The ITI-5 1 sequence 
consists of ds DNA corresponding to the thirteen 
positions from A750 to T662 immediately 5 1 adjacent to 
the Bgll site in the ITI-D1 sequence. Two changes, both 
silent, are introduced in this sequence: T to C at 
position 658 (changes codon for ASP 2 4 from GAT to GAC) 
and G to T at position 661 (changes codon for SER 2 5 from 
TCG to TCT) . The sequences BGL I and PSTI are identical 
to the Bgl l and Pst I sites, respectively, in the ITI-D1 
sequence. The ITI-3' sequence consists of dsDNA 

corresponding to the nine positions from A909 to T917 
immediately 3' adjacent to the Pst I site in the ITI-D1 
sequence. The one base change included in this 

sequence, A to T at position 917, is silent and changes 
the codon for ARG 77 from CGA to CGT. The STUFFER 
sequence consists of dsDNA encoding three residues (5' 
to 3'): LEU (TTA) , TRP (TGG) , and SER(TCA). The reverse 
complement of the STUFFER sequence encodes two 
translation termination codons (TGA and TAA) . Phage 
expressing gene III containing the linker in opposite 
orientation to that shown above will not produce a 
functional gene III product. 
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Phage MA RF DNA was digested with Nar l and the 
linear ca . 8.2 kb fragment was gel purified and subse 
quently dephosphorylated using HK phosphatase 
(Epicentre) . The linker oligonucleotides were annealed 
to form the linker fragment described above, which was 
then kinased using T4 Polynucleotide Kinase. The 
kinased linker was ligated to the Nar l -digested MA RF 
DNA in a 10:1 (linker :RF) molar ratio. After 18 hrs at 
16 °C, the ligation was stopped by incubation at 65°C for 
10 min and the ligation products were ethanol 
precipitated in the presence of 10 ^g of yeast tRNA. 
The dried precipitate was dissolved in 5 jxl of water and 
used to transform D1210 cells by electroporation. After 
60 min of growth in SOC at 37 °C, transformed cells were 
plated onto LB plates supplemented with ampicillin (Ap, 
200 Mg/ml) . RF DNA prepared from AP r isolates was 
subjected to restriction enzyme analysis. The DNA 
sequences of the linker insert and the immediately 
surrounding regions were confirmed by DNA sequencing. 
Phage strains containing the ITI Linker sequence 
inserted into the Nar l site in gene III are called MA- 
IL. 

Phage MA- IL RF DNA was partially digested with Bgll 
and the ca . 8.2 kb linear fragment was gel purified. 
This fragment was digested with Pst I and the large 
linear fragment was gel purified. The Bgl l to Pst I 
fragment of ITI-D1 was isolated from pMGIA (a plasmid 
carrying the sequence shown in TRAB86) . pMGIA was 
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to Pst I region of the ITI-D1 sequence, while IAI-2 spans 
the Pst I site in the ITI-D1 sequence. When aliquots of 
POP1 phage were used as substrates for PCR, template- 
specific products of characteristic size were produced 
in reactions containing 1UP or 2 UP plus IAI-1 or IAI-2 
primer pairs. No such products are obtained using MA-IL 
phage as template. No PCR products with sizes 

corresponding to complete ITI -Dl -gene III templates were 
obtained using POP1 phage and the 1UP or 2 UP plus 3DN 
primer pairs . This last result reflects the low 

abundance (<1%) of phage containing the complete ITI-D1 
sequence in POP1 . 

Preparative PCR was used to generate substrate 
amounts of the 330 bp PCR product of a reaction using 
the 1UP and IAI-2 primer pair to amplify the POP1 
template. The 33 0 bp PCR product was gel purified and 
then cut to completion with Bgll and Pst I . The 138 bp 
Bgll to Pst I fragment from ITI-D1 was isolated by 
agarose gel electrophoresis followed by Qiaex extraction 

(Qiagen, Studio City, CA) . MA-IL phage RF DNA was 
digested to completion with PstI. The ca . 8.2 kb linear 
fragment was gel purified and subsequently digested to 
completion with Bgl l . The Bgl l digest was extracted 
once with phenol : chloroform (1:1), the aqueous phase was 
ethanol precipitated, and the pellet was dissolved in TE 

(pH8.0). An aliquot of this solution was used' in a 
ligation reaction with the 13 8 bp Bgl l to Pst I fragment 
as described above. The ethanol precipitated ligation 



products were used to transform XL1 -Blue l TM) cells by 
electroporation and after 1 hr growth in SOC at 3 7°C / 
cells were plated on LB Ap plates. A phage population, 
P0P2, was prepared from Ap r colonies as described 
previously. 

Phage stocks obtained from individual plaques 
produced on titration of POP2 were tested by PCR for the 
presence of the complete ITI-D1-III gene fusion. PCR 
results indicate the entire fusion gene was present in 
seven of nine isolates tested. RF DNA from the seven 
isolates testing positive was subjected to restriction 
enzyme analysis. The complete sequence of the ITI-D1 
insertion into gene III was confirmed in four of the 
seven isolates by DNA sequence analysis. Phage isolates 
containing the ITI-D1-III fusion gene are called MA-ITI. 
3 . Expression and display of ITI-DI. 

Expression of the ITI domain I -Gene III fusion 
protein and its display on the surface of phage were 
demonstrated by Western analysis and phage titer 
neutralization experiments . 

For Western analysis, aliquots of PEG-purified 
phage preparations containing up to 4-10 10 infective 
particles were subjected to electrophoresis on a 12.5% 
SDS-urea-polyacrylamide gel. Proteins were transferred 
to a sheet of Immobilon-P transfer membrane (Millipore , 
Bedford, MA) by electrotransf er . Western blots were 
developed using a rabbit anti-ITI serum (SALI87) which 
had previously been incubated with an E^ coli extract, 
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followed by goat anti -rabbit IgG conjugated to horse 
radish peroxidase (#401315, Calbiochem, La Jolla, Ca) . 
An immunoreactive protein with an apparent size of ca. 
65-69 kD is detected in preparations of MA-ITI phage but 
not with preparations of the parental MA phage. The 
size of the immunoreactive protein is consistent with 
the expected size of the processed ITI-D1-III fusion 
protein ( ca . 67 kD, as previously observed for the BPTI- 
III fusion protein) . 

Rabbit anti-BPTI serum has been shown to block the 
ability of MK-BPTI phage to infect coli cells 

(Example II) . To test for a similar effect of rabbit 
anti-ITI serum on the infect ivity of MA-ITI phage, 10 /zl 
aliquots of MA or MA-ITI phage were incubated in 100 /il 
reactions containing 10 jxl aliquots of PBS, normal 
rabbit serum (NRS) , or anti-ITI serum. After a three 
hour incubation at 37 °C, phage suspensions were titered 
to determine residual plaque- forming activity. These 
data are summarized in Table 211. Incubation of MA-ITI 
phage with rabbit anti-ITI serum reduces titers 10- to 
100-fold, depending on initial phage titer. A much 
smaller decrease in phage titer (10 to 40%) is observed 
when MA-ITI phage are incubated with NRS. In contrast, 
the titer of the parental MA phage is unaffected by 
either NRS or anti-ITI serum. 

Taken together, the results of the Western analysis 
and the phage-titer neutralization experiments are 
consistent with the expression of an ITI-DI-III fusion 
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protein in MA-ITI phage, but not in the parental MA 
phage, such that ITI -specif ic epitopes are present on 
the phage surface. The ITI-specific epitopes are 
located with respect to III such that antibody binding 
to these epitopes prevents phage from infecting E_, coli 
cells . 

4. Fractionation of MA-ITI phage bound to agarose- 
immobilized protease beads. 

To test if phage displaying the ITI-DI-III fusion 
protein interact strongly with the proteases human 
neutrophil elastase (HUE) or cathepsin-G, aliquots of 
display phage were incubated with agarose- immobilized 
HNE or cathepsin-G beads (HNE beads or Cat-G beads, 
respectively) . The beads were washed and bound phage 
eluted by pH fractionation as described in Examples II 
and III. The procession in lowering pH during the 
elution was: pH 7.0, 6.0, 5.5, 5.0, 4.5, 4.0, 3.5, 3.0, 
2.5, and 2.0. Following elution and neutralization, the 
various input, wash, and pH elution fractions were 
titered. 

The results of several fractionations are 
summarized in Table 212 (EpiNE-7 or MA-ITI phage bound 
to HNE beads) and Table 213 (EpiC-10 or MA-ITI phage 
bound to Cat-G beads) . For the two types of beads (HNE 
or Cat-G) , the pH elution profiles obtained using the 
control display phage (EpiNE-7 or EpiC-10, respectively) 
were similar to those seen previously (Examples II and 
III). About 0.3% of the EpiNE-7 display phage applied 
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to the HNE beads were eluted during the fractionation 
procedure and the elution profile had a maximum for 
elution at about pH 4.0. A smaller fraction, 0.02%, of 
the Epic- 10 phage applied to the Cat-G beads were eluted 
and the elution profile displayed a maximum near pH 5.5. 

The MA-ITI phage show no evidence of great affinity 
for either HNE or cathepsin-G immobilized on agarose 
beads. The pH elution profiles for MA-ITI phage bound 
to HNE or Cat-G beads show essentially monotonic 
decreases in phage recovered with decreasing pH. 
Further, the total fractions of the phage applied to the 
beads that were recovered during the fractionation 
procedures were quite low: 0.002% from HNE beads and 
0.003% from Cat-G beads. 

Published values of Ki for inhibition neutrophil 
elastase by the intact, large (M r =240 , 000 ) ITI protein 
range between 6 0 and 150 nM and values between 2 0 and 
6000 nM have been reported for the inhibition of 
Cathepsin G by ITI (SWAI88, ODOM90) . Our own 

measurements of pH fraction of display phage bound to 
HNE beads show that phage displaying proteins with low 
affinity (>/xM) for HNE are not bound by the beads while 
phage displaying proteins with greater affinity (nM) 
bind to the beads and are eluted at about pH 5. If the 
first Kunitz-type domain ot the ITI light chain is 
entirely responsible for the inhibitory activity of ITI 
against HNE, and if this domain is correctly displayed 
on the MA-ITI phage, then it appears that the minimum 
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a££ inity o £ an inhibitor tor H»E that allows binding and 
£r actionation o £ display phage on HNE bead, rs 50 to 
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position 15, AT (MET) , was changed to GTC (VAL) , the 
codon for position 16, GGA (GLY) , was changed to GCT 
(ALA), the codon for position 18, ACC (THR) was changed 
to TTC (PHE) , and the codon for position 19, AGC (SER) , 
was changed to CCA (PRO) . MA-ITI RF DNA was digested 
with Eag I and Sty I. The large, linear fragment was 
gel purified and used in a ligation with the mutagenic 
cassette described above. Ligation products were used 
to transform XL1-Blue tm cells as described previously. 
Phage stocks obtained from overnight cultures of Ap r 
transductants were screened by PCR for incorporation of 
the altered sequence and the changes in the codons for 
positions 15, 16, 18, and 19 were confirmed by DNA 
sequencing. Phage isolates containing the ITI-DI-III 
fusion gene with the EpiNE-7 changes around the PI 
position are called MA-ITI-E7. 
6 . Fractionation of MA-ITI-E7 phage. 

To test if the changes at positions 15, 16, 18, and 
19 of the ITI-DI-III fusion protein influence binding of 
display phage to HNE beads, abbreviated pH elution 
profiles were measured. Aliquots of EpiNE-7, MA-ITI, 
and MA-ITI-E7 display phage were incubated with HNE 
beads for three hours at room temperature . The beads 
were washed and phage were eluted as described (Example 
III), except that only three pH elutions were performed: 
pH 7.0, 3.5, and 2.0. The results of these elutions are 
shown in Table 214. 
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Binding and elution of the EpiNE-7 and MA-ITI 
display phage were found to be as previously described. 
The total fraction of input phages was high (0.4%) for 

low (0 001%) for MA-ITI phage. 
EpiNE-7 phage and low (.u.uua-s; 

Further, the EpiNE-7 phage showed maximum phage elution 
in the P H 3.5 fraction while the MA-ITI phage showed 
only a monotonic decrease in phage yields with 
decreasing pH, as seen above. 

The two strains of MA-ITI-E7 phage show increased 
levels of binding to HNE beads relative to MA-ITI phage. 
The total fraction of the input phage eluted from the 
beads is 10-fold greater for both MA-ITI-E7 phage 
strains than for MA-ITI phage (although still 40- fold 
lower that EpiNE-7 phage). Further, the pH elution 
profiles of the MA-ITI-E7 phage strains show maxxmum 
elutions in the pH 3 . 5 fractions, similar to EpiNE-7 
phage . 

To further define the binding properties of MA- 
ITI -E7 phage, the extended pH fractionation procedure 
described previously was performed using phage bound to 
HNE beads. These data are summarized in Table 215. The 
p H elution profile of EpiNE-7 display phage is as 
previously described. In this more resolved, pH elution 
profile, MA-ITI-E7 phage show a broad elution maximum 
centered around pH 5. Once again, the total fraction of 
MA-ITI-E7 phage obtained on P H elution from HNE beads 

was about 40-fold less than that obtained using EpiNE-7 

display phage. 
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The pH elution behavior of MA-ITI-E7 phage bound to 
HNE beads is qualitatively similar to that seen using 
BPTI [K15L] -III-MA phage. BPTI with the K15L mutation 
has an affinity for HNE of «3.-10" 9 M. Assuming all else 
remains the same, the pH elution profile for MA-ITI-E7 
suggests that the affinity of the free ITI- DI-E7 domain 
for HNE might be in the nM range. If this is the case, 
the substitution of the EpiNE-7 sequence in place of the 
ITI-DI sequence around the PI region has produced a 20- 
to 50-fold increase in affinity for HNE (assuming Ki - 60 
to 150 nM for the unaltered ITI- DI) . 

If EpiNE-7 and ITI-DI-E7 have the same solution 
structure, these proteins present the identical amino 
acid sequences to HNE over the interaction surface . 
Despite this similarity, EpiNE-7 exhibits a roughly 
1000-fold greater affinity for HNE than does ITI-DI-E7. 
Again assuming similar structure, this observation 
highlights the importance of non- contacting secondary 
residues in modulating interaction strengths. 

Native ITI light chain is glycosylated at two 
positions, SER10 and ASN45 (GEBH86) . Removal of the 
glycosaminoglycan chains has been shown to decrease the 
affinity of the inhibitor for HNE about 5-fold (SELL87) . 
Another potentially important difference between EpiNE-7 
and ITI-DI-E7 is that of net charge. The changes in 
BPTI that produce EpiNE-7 reduce the total charge on the 
molecule from +6 to +1. Sequence differences between 
EpiNE-7 and ITI-DI-E7 further reduce the charge on the 
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latter to -1. Furthermore, the change in net charge 
between these two molecules arises from sequence 
differences occurring in the central portions of the 
molecules. Position 26 is LYS in EpiNE-7 and is THR in 
ITI-DI-E7, while at position 31 these residues are GLN 
and GLU, respectively. These changes in sequence not 
only alter the net charge on the molecules but also 
position negatively charged residue close to the 
interaction surface in ITI-DI-E7. It may be that the 
occurrence of a negative charge at position 31 (which is 
not found in any other of the HNE inhibitors described 
here) destabilized the inhibitor- protease interaction. 

EXAMPLE X 

GENERATION OF A VARIEGATED ITI-DI POPULATION 

The following is a hypothetical example 
demonstating how to obtain a derivative of ITI having 
high affinity for HNE. 

The results of Example IX demonstrate that the 
nature of the protein sequence around the PI position in 
ITI-DI can significantly influence the strength of the 
interaction between ITI-DI and HNE. While incorporation 
of the EpiNE-7 sequence increases the affinity of ITI-DI 
for HNE, it is unlikely that this particular sequence is 
optimal for binding. 

We generate a large population of potential binding 
proteins having differing sequences in the PI region of 
ITI-DI using the oligonucleotide ITIMUT. ITIMUT is 
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designed to incorporate variegation in ITI-DI at the six 
positions about and including the PI residue: 13, 15, 
16, 17, 18, and 19. ITIMUT is synthesized as one long 
(top strand) 73 base oligonucleotide and one shorter (24 
base) bottom strand oligonucleotide. The top strand 
sequence extends from position 770 (G) to position 842 
(G) in the sequence of TREB86. This sequence includes 
the codons for the positions of variegation as well as 
the recognition sequences for the flanking restriction 
enzymes Eag I (778 to 783) and Sty I (829 to 834) . The 
bottom strand oligonucleotide comprises the complement 
of the sequence from positions 819 to 842 . 

To generate the mutagenic cassette, the top and 
bottom strand oligonucleotides are annealed and the 
resulting duplex is completed in an extension reaction 
using DNA polymerase. Following digestion of the 73 bp 
dsDNA with Eag I and Sty I, the purified 51 bp mutagenic 
cassette is ligated with the large linear fragment 
obtained from a similar digestion of MA-ITI RF DNA. 
Ligation products are used to transform competent cells 
by electroporation and phage stocks produced from Ap r 
transductants are analyzed for the presence and nature 
of novel sequences as described previously. 

The variegation in the ITIMUT cassette is confined 
to the codons for the six positions in ITI-DI (13, 15, 
16, 17, 18, and 19), and employs three different 
nucleotide mixes: N, R, and S. For this mutagenesis, 
the composition of the N-mix is 36%A, 17%C, 23%G, and 
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24%T, and corresponds to the N-mix composition in the 
optimized NNS codon described elsewhere. The R-mix 
composition is 50%A, 50%G, and the S-mix composition is 
50%C, 50%G. 

The codon for ITI-DI position 13 (CCC, PRO) is 
changed to SNG in ITIMUT. This codon encodes the eight 
residues PRO, VAL, GLU, ALA, GLY, LEU, GLN, and' ARG. 
The encoded group includes the parental residue (PRO) as 
well as the more commonly observed variants at the 
position, ARG and LEU., (see Table 15), and also provides 
for the occurrence of acidic (GLU) , large polar (GLN) 
and nonpolar (VAL), and small (ALA, GLY) residues. 

The codons for positions 15 and 17 (ATG, MET) are 
changed to the optimized NNS codon. All 20 natural 
amino acid residues and a translation termination are 
allowed . 

The codon for position 16 (CGA, GLY) is changed to 
RNS in ITIMUT. This codon encodes the twelve amino 
acids GLY, ALA, ASP, GLU, VAL, MET, ILE , THR, SER, ARG, 
ASN, and LYS . The encoded group includes the most 
commonly observed residues at this position, ALA and 
GLY, and provides for the occurrence of both positively 
(ARG, LYS) and negatively (GLU, ASP) charged amino 
acids. Large nonpolar residues are also included (ILE, 
MET, VAL) . 

Finally, at positions 18 and 19, the ITI-DI 
sequence is changed from ACC'AGC (THR* SER) to NNT'NNT. 
The NNT codon encodes the fifteen amino acid residues 
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PHE, SER, TYR, CYS , LEU, PRO, HIS, ARG, ILE, THR, ASN, 
VAL, ALA, ASP, and GLY . This group includes the 
parental residues and the further advantages of the NNT 
codon have been discussed elsewhere. 

The ITIMUT DNA sequence encodes a total of : 
8 * 20 * 12 * 20 * 15 * 15 = 8,640,000 

different protein sequences in a total of: 

2 25 = 33,554,422 

different DNA sequences. The total number of protein 
sequences encoded by ITIMUT is only 7. 4 -fold fewer than 
the total possible number of natural sequences obtained 
from variation at six positions (- 20 s = 6.4-10 7 ). 
However, this degree of variation in protein sequence is 
obtained from a minimum of 1.07xl0 9 (NNS 6 = 2 30 ) DNA 
sequences, a 32 -fold greater number than that comprising 
ITIMUT. Thus, ITIMUT is an efficient vehicle for the 
generation of a large and diverse population of 
potential binding proteins . 

EXAMPLE XI 

DEVELOPMENT AND SELECTION OF BPTI MUTANTS FOR 
BINDING TO HORSE HEART MYOGLOBIN (HHMB) 

The following example is hypothetical and 
illustrates alternative embodiments of the invention not 
given in other examples . 

HHMb is chosen as a typical protein target; any- 
other protein could be used. HHMb satisfies all of the 
criteria for a target: 1) it is large enough to be 
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applied to an affinity matrix, 2) after attachment it is 
not reactive, and 3) after attachment there is 
sufficient unaltered surface to allow specific binding 
by PBDs . 

The essential information for HHMb is known: 1) 
HHMb" is stable at least up to 70°C, between pH 4.4 and 
9.3, 2) HHMb is stable up to 1.6 M Guanidinium Cl , 3) 
the pi of HHMb is 7.0, 4) for HHMb, M r = 16, 000, 5) HHMb 
requires haem, 6) HHMb has no proteolytic activity. 

In addition, the following information about HHMb 
and other myoglobins is available: 1) the sequence of 
HHMb is known, 2) the 3D structure of sperm whale myo 
globin is known; HHMb has 19 amino acid differences and 
it is generally assumed that the 3D structures are 
almost identical , 3 ) HHMb has no enzymatic activity, 4 ) 
HHMb is not toxic. 

We set the specifications of an SBD as : 
1) T = 25°C; 2) pH = 8.0; 3) Acceptable solutes ((A) for 
binding: i) phosphate, as buffer, 0 to 20 mM, and ii) 
KC1, 10 mM; (B) for column elution: i) phosphate, as 
buffer, 0 to 30 mM, ii) KC1, up to 5 M, and iii) 
Guanidinium Cl, up to 0.8 M.); 4) Acceptable K<i < 1.0- 
10" 8 M. 

As stated in Sec. III.B, the residues to be varied 
are picked, in part, through the use of interactive 
computer graphics to visualize the structures . In this 
example, all residue numbers refer to BPTI . We pick a 
set of residues that forms a surface such that all 
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residues can contact one target molecule. Information 
that we refer to during the process of choosing residues 
to vary includes: 1) the 3D structure of BPTI, 2) 
solvent accessibility of each residue as computed by the 
method of Lee and Richards (LEEB71) , 3) a compilation of 
sequences of other proteins homologous to BPTI, and 4) 
knowledge of the structural nature of different amino 
acid types. 

Tables 16 and 34 indicate which residues of BPTI : 
a) have substantial surface exposure, and b) are known 
to tolerate other amino acids in other closely related 
proteins . We use interactive computer graphics to pick 
sets of eight to twenty residues that are exposed and 
variable and such that all members of one set can touch 
a molecule of the target material at one time. If BPTI 
has a small amino acid at a given residue, that amino 
acid may not be able to contact the target 
simultaneously with all the other residues in the 
interaction set, but a larger amino acid might well make 
contact. A charged amino acid might affect binding 
without making direct contact. In such cases, the 
residue should be included in the interaction set, with 
a notation that larger residues might be useful . In a 
similar way, large amino acids near the geometric center 
of the interaction set may prevent residues on either 
side of the large central residue from making 
simultaneous contact. If a small amino acid, however, 
were substituted for the large amino acid, then the 
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surface would become flatter and residues on either side 
could make simultaneous contact. Such a residue should 
be included in the interaction set with a notation that 
small amino acids may be useful. 

Table 35 was prepared from standard model parts and 
shows the maximum span between C fi and the tip of each 
type of side group. C E is used because it is rigidly 
attached to the protein main-chain; rotation about the 
Ca -C fi bond is the most important degree of freedom for 
determining the location of the side group. 

Table 34 indicates five surfaces that meet the 
given criteria. The first surface comprises the set of 
residues that actually contacts trypsin in the complex 
of trypsin with BPTI as reported in the Brookhaven 
Protein Data Bank entry -1TPA-. This set is indicated 
by the number "1". The exposed surface of the residues 
in this set (taken from Table 16) totals 1148 A*. 
Although this is not strictly the area of contact 
between BPTI and trypsin, it is approximately the same. 

Other surfaces, numbered 2 to 5, were picked by 
first picking one exposed, variable residue and then 
picking neighboring residues until a surface was 
defined. The choice of sets of residues shown in Table 
34 is in no way exhaustive or unique; other sets of 
variable, surface residues can be picked. Set #2 is 
shown in stereo view, Figure 14, including the a carbons 
of BPTI, the disulfide linkages, and the side groups of 
the set. We take the orientation of BPTI in Figure 14 
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as a standard orientation and hereinafter refer 

as being at the top of the molecule, while the carboxy 

and amino termini are at the bottom. 

solvent accessibilities are useful. easily 
tabulated indicators of a residue's exposure. solvent 
accessibilities must be used with some caution, small 
amino acids are under-represented and lar g e amino acids 
over-represented. The user must consider what the 
solvent accessibility of a different amino acid would be 
when substituted into the structure of BPTI . 

TO create specific binding between a derivative of 
BPTI and HHMb, we will vary the residues in set #2. 
This set includes the twelve principal residues 17 E , 
„<!,. 21 «). 27 (A). 28(G). M (W . 31 (Q) ■ 32 (T) . 34 (V), 
48(A >, 49(E). and 52 <M> (Sec. III.B). «one of the 
residues in set #2 is completely conserved in the sample 
of sequences reported in Table 34; thus we can vary them 
with a high probability of retaining the underlying 
structure. Independent substitution at each of these 
twelve residues of the amino acid types observed at that 
residue would produce approximately 4.4-10 amino acid 
sequences and the same number of surfaces. 

BPTI is a very basic protein. This property has 
been used in isolating and purifying BPTI and its 
homologues so that the high frequency of arginrne and 
lysine residues may reflect bias in isolation and is not 
necessarily required by the structure. Indeed, SCI-HI 
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from Bombyx mori contains seven more acidic than basic 
groups (SASA84) . 

Residue 17 is highly variable and fully exposed and 
can contain R, K, A, Y, H, F, L, M, T, G, Y, P, or S. 
All types of amino acids are seen: large, small, 
charged, neutral, and hydrophobic. That no acidic 
groups are observed may be due to bias in the sample. 

Residue 19 is also variable and fully exposed, 
containing P, R, I, S, K, Q, and L. 

Residue 21 is not very variable, containing F or Y 
in 31 of 33 cases and I and W in the remaining cases. 
The side group of Y21 fills the space between T32 and 
the main chain of residues 47 and 48. The OH at the tip 
of the Y side group projects into the solvent. Clearly 
one can vary the surface by substituting Y or F so that 
the surface is either hydrophobic or hydrophilic in that 
region. It is also possible that the other aromatic 
amino acid ( viz . H) or the other hydrophobics (L, M, or 
V) might be tolerated. 

Residue 27 most often contains A, but S, K, L, and 
T are also observed. On structural grounds, this 
residue will probably tolerate any hydrophilic amino 
acid and perhaps any amino acid. 

Residue 28 is G in BPTI . This residue is in a 
turn, but is not in a conformation peculiar to glycine. 
Six other types of amino acids have been observed at 
this residue: K, N, Q, R, H, and N. Small side groups 
at this residue might not contact HHMb simultaneously 
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with residues 17 and 34. Large side groups could 
interact with HHMb at the same time as residues 17 and 
34 . Charged side groups at this residue could affect 
binding of HHMb on the surface defined by the other 
residues of the principal set. Any amino acid, except 
perhaps P, should be tolerated. 

Residue 29 is highly variable, most often contain 
ing L . This fully exposed position will probably 
tolerate almost any amino acid except, perhaps, P. 

Residues 31 , 32 , and 34 are highly variable , 
exposed, and in extended conformations; any amino acid 
should be tolerated. 

Residues 48 and 49 are also highly variable and 
fully exposed, any amino acid should be tolerated. 

Residue 52 is in an a helix. Any amino acid, 
except perhaps P, might be tolerated. 

Now we consider possible variation of the secondary 
set (Sec. 13.1.2) of residues that are in the 
neighborhood of the principal set. Neighboring residues 
that might be varied at later stages include 9(P), 
11 (T), 15 (K), 16(A), 18(1), 20 (R), 22(F), 24 (N) , 26 (K) , 
35 (Y), 47 (S), 50(D), and 53 (R) . 

Residue 9 is highly variable, extended, and 
exposed. Residue 9 and residues 4 8 and 4 9 are separated 
by a bulge caused by the ascending chain from residue 31 
to 34 m For residue 9 and residues 48 and 49 to 
contribute simultaneously to binding, either the target 
must have a groove into which the chain from 31 to 34 
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can fit, or all three residues (9, 48, and 49) must have 
large amino acids that effectively reduce the radius of 
curvature of the BPTI derivative. 

Residue 11 is highly variable, extended, and 
exposed. Residue 11, like residue 9, is slightly far 
from the surface defined by the principal residues and 
will contribute to binding in the same circumstances. 

Residue 15 is highly varied. The side group of 
residue 15 points away form the face defined by set #2 . 
Changes of charge at residue 15 could affect binding on 
the surface defined by residue set #2 . 

Residue 16 is varied but points away from the 
surface defined by the principal set. Changes in charge 
at this residue could affect binding on the face defined 
by set #2 . 

Residue 18 is I in BPTI. This residue is in an 
extended conformation and is exposed. Five other amino 
acids have been observed at this residue: M, F, L, V, 
and T. Only T is hydrophilic. The side group points 
directly away from the surface defined by residue set 
#2. Substitution of charged amino acids at this residue 
could affect binding at surface defined by residue set 
#2. 

Residue 20 is R in BPTI. This residue is in an 
extended conformation and is exposed. Four other amino 
acids have been observed at this residue: A, S, L, and 
Q. The side group points directly away from the surface 
defined by residue set #2 . Alteration of the charge at 
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this residue could affect binding at surface defined by 

residue set #2 . 

Residue 22 is only slightly varied, being Y, F, or 
H in 30 of 33 cases. Nevertheless, A, N, and S have 
been observed at this residue. Amino acids such as L, 
M, I, or Q could be tried here. Alterations at residue 
22 may affect the mobility of residue 21; changes in 
charge at residue 22 could affect binding at the surface 
defined by residue set #2 . 

Residue 24 shows some variation, but probably can 
not interact with one molecule of the target simul 
taneously with all the residues in the principal set. 
Variation in charge at this residue might have an effect 
on binding at the surface defined by the principal set. 

Residue 26 is highly varied and exposed. Changes 
in charge may affect binding at the surface defined by 
residue set #2; substitutions may affect the mobility of 
residue 27 that is in the principal set. 

Residue 35 is most often Y, W has been observed. 
The side group of 35 is buried, but substitution of F or 
W could affect the mobility of residue 34 . 

Residue 47 is always T or S in the sequence sample 
used. The O gamma probably accepts a hydrogen bond from 
the NH of residue 50 in the alpha helix. Nevertheless, 
there is no overwhelming steric reason to preclude other 
amino acid types at this residue. In particular, other 
amino acids the side groups of which can accept hydrogen 
bonds, viz. N, D, Q, and E, may be acceptable here. 
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Residue 50 is often an acidic amino acid, but other 
amino acids are possible. 

Residue 53 is often R, but other amino acids have 
been observed at this residue. Changes of charge may 
affect binding to the amino acids in interaction set #2 . 

Stereo Figure 14 shows the residues in set #2, plus 
R39. From Figure 14, one can see that R39 is on the 
opposite side of BPTI form the surface defined by the 
residues in set #2. Therefore, variation at residue 39 
at the same time as variation of some residues in set #2 
is much less likely to improve binding that occurs along 
surface #2 than is variation of the other residues in 
set #2 . 

In addition to the twelve principal residues and 13 
secondary residues, there are two other residues, 30(C) 
and 33 (F) , involved in surface #2 that we will probably 
not vary, at least not until late in the procedure. 
These residues have their side groups buried inside BPTI 
and are conserved. Changing these residues does not 
change the surface nearly so much as does changing 
residues in the principal set. These buried, conserved 
residues do, however, contribute to the surface area of 
surface #2 . The surface of residue set #2 is comparable 
to the area of the trypsin-binding surface. Principal 
residues 17, 19, 21, 27, 28, 29, 31, 32, .34, 48, 49, and 
52 have a combined solvent- accessible area of 946.9 A 2 . 
Secondary residues 9, 11, 15, 16, 18, 20, 22, 24, 26, 
35, 47, 50, and 53 have combined surface of 1041.7 A 2 . 
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Residues 30 and 33 have exposed surface totaling 38.2 A 2 . 
Thus the three groups' combined surface is 2 02 6.8 A 2 . 

Residue 3 0 is C in BPTI and is conserved in all 
homologous sequences. It should be noted, however, that 
C14/C3 8 is conserved in all natural sequences, yet Marks 
et al . (MARK8 7) showed that changing both C14 and C3 8 to 
A, A or T,T yields a functional trypsin inhibitor. Thus 
it is possible that BPTI-like molecules will fold if C30 
is replaced. 

Residue 3 3 is F in BPTI and in all homologous 
sequences. Visual inspection of the BPTI structure 
suggests that substitution of Y, M, H, or L might be 
tolerated . 

Having identified twenty residues that define a 
possible binding surface, we must choose some to vary 
first. Assuming a hypothetical affinity separation 
sensitivity, C sen si/ of 1 in 4-10 8 , we decide to vary six 
residues (leaving some margin for error in the actual 
base composition of variegated bases) . To obtain 

maximal recognition, we choose residues from the 
principal set that are as far apart as possible. Table 
3 6 shows the distances between the 6 carbons of residues 
in the principal and peripheral set. R17 and V34 are at 
one end of the principal surface. Residues A27, G28, 
L2 9, A4 8, E4 9, and M52 are at the other end, about 
twenty Angstroms away; of these , we will vary residues 
17, 27, 29, 34, and 48. Residues 28, 49, and 52 will be 
varied at later rounds. 
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Of the remaining principal residues, 21 is left to 
later variations. Among residues 19, 31, and 32, we 
arbitrarily pick 19 to vary. 

Unlimited variation of six residues produces 6.4 -10 7 
amino acid sequences. By hypothesis, C sen si is 1 in 
4-10 8 . Table 3 7 shows the programmed variegation at the 
chosen residues. The parental sequence is present as 1 
part in 5.5-10 7 , but the least favored sequences are 
present at only 1 part in 4.2-10 9 . Among single- amino- 
acid substitutions from the PPBD, the least favored is 
F17-I19-A27-L29-V34-A48 and has a calculated abundance 
of 1 part in 1.6-10 8 . Using the optimal qfk codon, we 
can recover the parental sequence and all one-amino-acid 
substitutions to the PPBD if actual nt compositions come 
within 5% of programmed compositions. The number of 
transf ormants is M ntv = 1.0-10 9 (also by hypothesis), thus 
we will produce most of the programmed sequences. 

The residue numbers of the preceding section are 
referred to mature BPTI (R1-P2-. . .-A58) . Table 25 has 
residue numbers referring to the pre-M13CP-BPTI protein; 
all mature BPTI sequence numbers have been increased by 
the length of the signal sequence, i.e. 23. Thus in 
terms of the pre-OSP-PBD residue numbers, we wish to 
vary residues 40, 42, 50, 52, 57, and 71. A DNA 
subsequence containing all these codons is found between 
the ( Apa l/ Dra ll/ Pss I) sites at base 191 and the Sph I 
site at base 3 09 of the osp-pbd gene. Among Apa l , Dral , 
and PssI , Apa l is preferred because it recognizes six 
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are synthesized on the sense (bottom) strand. The 
design calls for "qfk" in the antisense strand, so that 
the sense strand contains (from 5' to 3 ' ) a) equal part 
C and A ( i.e. the complement of k) , b) (0.40 T, 0.22 A, 
0.22 C, and 0.16 G) ( i.e. the complement of f ) , and c) 
(0.26 T, 0.26 A, 0.30 C, and 0.18 G) . 

Each residue that is encoded by "qfk" has 21 
possible outcomes, each of the amino acids plus stop. 
Table 12 gives the distribution of amino acids encoded 
by "qfk", assuming 5% errors. The abundance of the 
parental sequence is the product of the abundances of R 
xIxAxLxVxA. The abundance of the least- 
favored sequence is 1 in 4.2 -10 9 . 

Olig#2 7 and olig#2 8 are annealed and extended with 
Klenow fragment and all four (nt)TPs. Both the ds 
synthetic DNA and RF pLG7 DNA are cut with both Apa l and 
Sph l . The cut DNA is purified and the appropriate 
pieces ligated (See Sec. 14.1) and used to transform 
competent PE383. (Sec. 14.2). In order to generate a 
sufficient number of transf ormants , V c is set to 5000 ml. 

1) culture coli in 5.0 1 of LB broth at 37 °C until 
cell density reaches 5-10 7 to 7-10 7 cells/ml, 

2) chill on ice for 65 minutes, centrifuge the cell 
suspension at 4000g for 5 minutes at 4°C, 

3) discard supernatant; resuspend the cells in 1667 ml 
of an ice-cold, sterile solution of 60 mM CaCl 2 , 

4) chill on ice for 15 minutes, and then centrifuge at 
4000g for 5 minutes at 4°C, 
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5) discard supernatant; resuspend cells in 2 x 400 ml 
of ice-cold, sterile 60 mM CaCl 2 ; store cells at 
4°C for 24 hours, 

6) add DNA in ligation or TE buffer; mix and store on 
ice for 30 minutes; 20 ml of solution containing 5 
/xg/ml of DNA is used, 

7) heat shock cells at 42°C for 90 seconds, 

8) add 200 ml LB broth and incubate at 3 7 °C for 1 
hour, 

9) add the culture to 2.0 1 of LB broth containing 
ampicillin at 35-100 /xg/ml and culture for 2 hours 
at 37°C, 

10) centrifuge at 8000 g for 20 minutes at 4°C, 

11) discard supernatant, resuspend cells in 50 ml of LB 
broth plus ampicillin and incubate 1 hour at 37 °C, 

12) plate cells on LB agar containing ampicillin, 

13) harvest virions by method of Salivar et al . 
(SALI64) . 

The heat shock of step (7) can be done by dividing the 
200 ml into 100 200 /xl aliquots in 1.5 ml plastic 
Eppendorf tubes. It is possible to optimize the heat 
shock for other volumes and kinds of container. It is 
important to: a) use all or nearly all the vgDNA 
synthesized in ligation, this will require large amounts 
of pLG7 backbone, b) use all or nearly all the ligation 
mixture to transform cells, and c) culture all or nearly 
all the transf ormants at high density. These measures 
are directed at maintaining diversity. 
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IPTG is added to the growth medium at 2.0 mM (the 
optimal level) and virions are harvested in the usual 
way. It is important to collect virions in a way that 
samples all or nearly all the transf ormants . Because F" 
cells are used in the transformation, multiple 
infections do not pose a problem. 

HHMb has a pi of 7.0 and we carry out 
chromatography at pH 8.0 so that HHMb is slightly 
negative while BPTI and most of its mutants are 
positive. HHMb is fixed (Sec. V.F) to a 2 . 0 ml column 
on Affi- Gel 10 (TM) or Affi-Gel 15 (TM) at 4.0 mg/ml support 
matrix, the same density that is optimal for a column 
supporting trp. 

We note that charge repulsion between BPTI and HHMb 
should not be a serious problem and does not impose any 
constraints on ions or solutes allowed as eluants. 
Neither BPTI nor HHMb have special requirements that 
constrain choice of eluants. The eluant of choice is 
KC1 in varying concentrations. 

To remove variants of BPTI with strong, 
indiscriminate binding for any protein or for the 
support matrix, we pass the variegated population of 
virions over a column that supports bovine serum albumin 
(BSA) before loading the population onto the {HHMb} 
column. Affi-Gel 10 tTM) or Affi-Gel 15 (TM) is used to 
immobilize BSA at the highest level the matrix will 
support. A 10.0 ml column is loaded with 5.0 ml of 
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Affi-Gel- linked-BSA; this column, called {BSA}, has V v = 
5.0 ml. The variegated population of virions containing 
10 12 pfu in 1 ml (0.2 x V v ) of 10 mM KC1, 1 mM phosphate, 
pH 8.0 buffer is applied to {BSA} . We wash {BSA} with 
4.5 ml (0.9 x V v ) of 50 mM KCl , 1 mM phosphate, pH 8.0 
buffer. The wash with 50 mM salt will elute virions 
that adhere slightly to BSA but not virions with strong 
binding. The pooled effluent of the {BSA} column is 5.5 
ml of approximately 13 mM KCl. 

The column {HHMb} is first blocked by treatment 
with 10 11 virions of M13(am429) in 100 ul of 10 mM KCl 
buffered to pH 8.0 with phosphate; the column is washed 
with the same buffer until OD 2 6o returns to base line or 
2 x V v have passed through the column, whichever comes 
first. The pooled effluent from {BSA} is added to 
{HHMb} in 5.5 ml of 13 mM KCl, 1 mM phosphate, pH 8.0 
buffer. The column is eluted in the following way: 

1) 10 mM KCl buffered to pH 8.0 with phosphate, until 
optical density at 280nm falls to base line or 2 x 
V v , whichever is first, (effluent dis carded) , 

2) a gradient of 10 mM to 2 M KCl in 3 x V v , pH held at 
8.0 with phosphate, (30-100 /xl fractions), 

3) a gradient of 2 M to 5 M KCl in 3 x V V/ phosphate 
buffer to pH 8.0 (30-100 fxl fractions), 

4) constant 5 M KCl plus 0 to 0.8 M guanidinium CI in 
2 x V v , with phosphate buffer to pH 8.0, (20-100 fil 
fractions) , and 
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5) constant 5 M KC1 plus 0.8 M guanidinium CI in 1 x 
V v , with phosphate buffer to pH 8.0, (10-100 til 
fractions) . 

In addition to the elution fractions, a sample is 
removed from the column and used as an inoculum for 
phage -sensitive Sup~ cells (Sec. V) . A sample of 4 ^tl 
from each fraction is plated on phage-sensitive Sup" 
cells. Fractions that yield too many colonies to count 
are replated at lower dilution. An approximate titre of 
each fraction is calculated. Starting with the last 
fraction and working toward the first fraction that was 
titered, we pool fractions until approximately 10 9 phage 
are in the pool, i . e . about 1 part in 10 0 0 of the phage 
applied to the column. This population is infected into 
3-10 11 phage-sensitive PE384 in 300 ml of LB broth. The 
very low multiplicity of infection (moi) is chosen to 
reduce the possibility of multiple infection. After 
thirty minutes, viable phage have entered recipient 
cells but have not yet begun to produce new phage. 
Phage -born genes are expressed at this phase, and we can 
add ampicillin that will kill uninfected cells. These 
cells still carry F-pili and will absorb phage helping 
to prevent multiple infec tions. 

If multiple infection should pose a problem that 
cannot be solved by growth at low multiple-of- infection 
on F + cells, the following procedure can be employed to 
obviate the problem. Virions obtained from the affinity 
separation are infected into F + coli and cultured to 
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amplify the genetic messages (Sec. V). CCC DNA is 
obtained either by harvesting RF DNA or by in vitro 
extension of primers annealed to ss phage DNA. The CCC 
DNA is used to transform F" cells at a high ratio of 
cells to DNA. Individual virions obtained in this way 
should bear only proteins encoded by the DNA within. 

The phagemid population is grown and chromato 
graphed three times and then examined for SBDs (Sec. V) . 
In each separation cycle, phage from the last three 
fractions that contain viable phage are pooled with 
phage obtained by removing some of the support matrix as 
an inoculum. At each cycle, about 10 12 phage are loaded 
onto the column and about 10 9 phage are cultured for the 
next separation cycle. After the third separation 
cycle, SBD colonies are picked from the last fraction 
that contained viable phage. 

Each of the SBDs is cultured and tested for 
retention on a Pep-Tie column supporting HHMb. The 
phage showing the greatest retention on the Pep-Tie 
{HHMb} column. This SBD1 becomes the parental amino- 
acid sequence to the second variegation cycle. 

Assume for the sake of argument that, in SBD!, R4 0 
changed to D, 14 2 changed to Q, ABO changed to E, L52 
remained L, and A71 changed to W (see Table 38) . If so, 
a rational plan for the second round of variegation 
would be that which is se t forth in Table 39. The 
residues to be varied are chosen by: a) choosing some of 
the residues in the principal set that were not varied 
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in the first round ( viz . residues 42, 44, 51, 54, 55, 
72, or 75 of the fusion), and b) choosing some residues 
in the secondary set. Residues 51, 54, 55, and 72 are 
varied through all twenty amino acids and, unavoidably, 
stop. Residue 44 is only varied between Y and F. Some 
residues in the secondary set are varied through a 
restricted range; primarily to allow dif ferent charges 
(+, 0, -) to appear. Residue 38 is varied through K, R, 
E, or G. Residue 41 is varied through I, V, K, or E. 
Residue 43 is varied through R, S, G, N, K, D, E, T, or 
A. 

Now assume that in the most successful SBD of the 
second round of variegation (SBD-2!), residue 38 (K15 of 
BPTI) changed to E # 41 becomes V, 43 goes to N, 44 goes 
to F, 51 goes to F, 54 goes to S, 55 goes to A, and 72 
goes to Q (see Table 40) . A third round of variation is 
illustrated in Table 41; eight amino acids are varied. 
Those in the principal set, residues 40, 55, and 57, are 
varied through all twenty amino acids. Residue 32 is 
varied through P, Q, T, K, A, or E . Residue 34 is 
varied through T, P, Q, K, A, or E. Residue 44 is 
varied through F, L, Y, C, W, or stop. Residue 50 is 
varied through E, K, or Q. Residue 52 is varied through 
L, F, I, M, or V. The result of this variation is shown 
in Table 42 . 

This example is hypothetical. It is anticipated 
that more variegation cycles will be needed to achieve 
dissociation constants of 10" 8 M. It is also possible 
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that more than three separation cycles will be needed in 
some variegation cycles. Real DNA chemistry and DNA 
synthesizers may have larger errors than our hypothe 
tical 5%. If Serr > 0.05, then we may not be able to 
vary six residues at once. Variation of 5 residues at 
once is certainly possible. 

EXAMPLE XII 

DESIGN AND MUTAGENESIS OF A CLASS 1 MINI -PROTEIN 

To obtain a library of binding domains that are 
conf ormationally constrained by a single disulfide, we 
insert DNA coding for the following family of mini- 
proteins into the gene coding for a suitable OSP. 

X 1 'X 2 -C-X 3 -X 4 -X S 'X 6 ~C-X 7 -Xs (SEQ ID NO : 1 9 ) - - 
I 1 

Where 1 1 indicates disulfide bonding; this mini- 
protein is depicted in Figure 3. Disulfides normally do 
not form between cysteines that are consecutive on the 
polypeptide chain. One or more of the residues 

indicated above as X n will be varied extensively to 
obtain novel binding. There may be one or more amino 
acids that precede Xi or follow X8 , however, these 
additional residues will not be significantly 
constrained by the diagrammed disulfide bridge, and it 
is less advantageous to vary these remote, unbridged 
residues. The last X residue is connected to the OSP of 
the genetic package . 
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Xl , X 2 , X 3 , X 4 , X 5 , X S , X 7/ and X 8 can be varied 
independently; a different scheme of variegation 

could be used at each position. Xx and X 8 are the least 
constrained residues and may be varied less than other 
positions. 

Xl and X 8 can be, for example, one of the amino 
acids [E, K, T, and A]; this set of amino acids is 
preferred because: a) the possibility of positively 
charged, negatively charged, and neutral amino acids is 
provided, b) these amino acids can be provided in 
1:1:1:1 ratio via the codon RMG (R = equimolar A and G, 
M = equimolar A and C) , and c) these amino acids allow 
proper processing by signal peptidases. 

One option for variegation of X 2 , X 3 , X 4 , X 5 , X 6 , and 
X 7 is to vary all of these in the same way. For example, 
each of X 2 , X 3 , X 4 , X 5 , X s , and X 7 can be chosen from the 
set [F, S, Y, C, L, P, H. R, I, T, N, V, A, D, and G] 
which is encoded by the mixed codon NNT. Tables 10 and 
130 compares libraries in which six codons have been 
varied either by NNT or NNK codons. NNT encodes 15 
different amino acids and only 16 DNA sequences. Thus, 
there are 1.13 9 - 10 7 amino- acid sequences, no stops, and 
only 1.678 • 10 7 DNA sequences. A library of 10 8 
independent transf ormants will contain 99% of all 
possible sequences. The NNK library contains 6.4 • 10 7 
sequences, but complete sampling requires a much larger 
number of independent transf ormants . 
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EXAMPLE XIII 
A CYS: : HELIX :: TURN: : STRAND : : CYS UNIT 

The parental Class 2 mini -proteins may be a 
naturally-occurring Class 2 mini-protein. It may also 
be a domain of a larger protein whose structure 
satisfies or may be modified so as to satisfy the 
criteria of a class 2 mini-protein. The modification 
may be a simple one, such as the introduction of a 
cysteine (or a pair of cysteines) into the base of a 
hairpin structure so that the hairpin may be closed off. 
with a disulfide bond, or a more elaborate one, so as 
the modification of intermediate residues so as to 
achieve the hairpin structure. The parental class 2 
mini -protein may also be a composite of structures from 
two or more naturally-occurring proteins, e.g. , an a 
helix of one protein and a 6 strand of a second protein. 

One mini -protein motif of potential use comprises a 
disulfide loop enclosing a helix, a turn, and a return 
strand. Such a structure could be designed or it could 
be obtained from a protein of known 3D structure. 
Scorpion neurotoxin, variant 3, (ALMA83a, ALMA83b) 
(hereafter ScorpTx) contains a structure diagrammed in 
Figure 15 that comprises a helix (residues N22 through 
N33) , a turn (residues 33 through 35) , and a return 
strand (residues 3 6 through 41) . ScorpTx contains 
disulfides that join residues 12-65, 16-41, 25-46, and 
29-48. CYS 2 s and CYS 4X are quite close and could be 
joined by a disulfide without deranging the main chain. 
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Figure 15 shows CYS25 joined to CYS41. In addition, CYS29 
has been changed to GLN . It is expected that a disulfide 
will form between 25 and 41 and that the helix shown will 
form; we know that the amino-acid sequence shown is highly 
compatible with this structure. The presence of GLY 35 , 
GLY 36 , and GLY39 give the turn and extended strand 
sufficient flexibility to accommodate any changes needed 
around CYS41 to form the disulfide. 

From examination of this structure (as found in entry 
1SN3 of the Brookhaven Protein Data Bank) , we see that the 
following sets of residues would be preferred for 
variegation : 
SET 1 

Residue Codon Allowed amino acids Naa/Ndna 

L 2 £ R 2 J Ml V j.S ± P yT..y A, y 13/15 

Qj-Kf^W^G, . 

L JM» V^P^ T^ A^G , K , E, 9/9 

L^M ,P y T ,A J) G J K J E 9/9 

L,M2Y JL P JL T,A,G ,K>E 9/9 

L 2 2 R% M > S > P > T^A ) 13/15 
Q^KjE^W jG r 

L l M jyL F L\ A JL?> K > E 9/9 

H^Ni-K^E^D 6/6 



1) 


T 27 


NNG 


2) 


E28 


VHG 


3) 


A31 


VHG 


4) 


K32 


VHG 


5) 


G24 


NNG 


6) 


E23 


VHG 


7) 


Q34 


VAS 



Note: Exponents on amino acids indicate multiplicity of 
codons . 

Positions 27, 28, 31, 32, 24, and 23 comprise one 
face of the helix. At each of these locations we 
havepicked a variegating codon that a) includes the 
parental amino acid, b) includes a set of residues having 
a predominance of helix favoring residues, c) provides for 
a wide variety of amino acids, and d) leads to as even a 
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L 23. -s we allow variation here an^, 

, ha f are compatible with turns. The 
amino adds that are co P ^ 

variegation shown leads to 

encoded by 8.85-10 6 DNA sequences. 

. . -rids Naa/Ndna 
Residu^_Codon Wlowed_a JS H^ £i ^ 

IT^T^ ivjvwi** 13/18 



13/15 



2) 



3 



9/9 
9/9 



H *Q> N ; K > D 2 E 

T 27 NNG LS^IV^f^l 

Q> K ; E ; W ) G> " 

K30 VHG K>;Q7P/>,L iM ,V 

4) A 31 VHG K^Q>,T, A ,L,M,V 

5) K 32 VHG WW> G -' K ' E 4/4 

6) S37 RRT S,N,D,G 

„ v NHT Y J S > -F i H,P,L,N,T | I,D 3 A J V 9/9 

7 ' LitZs «. 2^35/3^-32 :» — .o as 

to „e rL^um 

population. Residues 37 and 

so that we pic, different vacation codons . Thrs 

■ ,no,s 4 43-10 6 amino-aoid sequences and 
variegation allows 4.4 ^ 

7 08-10' DMA sequences. Thus a 

scheme can be sampled very efficiently. 

EXAMPLE XIV ^ mwT « 
DE SIG» AHD MUTAGENESIS OF CLASS 3 *mX-**B* 

Twojiisulfidejffi^^ 

^7^- ^ToTs^fide bonds may be 

veiled after the a-conotoxins, ^ GX. GI*, Gil, MI. 
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and SI. These have the following conserved- structure 
(SEQ ID NOs : 20-31) : ^ 

12 1 ' 2 1 

(1-2 AAs) -C-C- (3 AAs)-C-(5 AAs) -C- (0-5 AAs) 

H 1 I 

i i 

Hashimoto et al . (HASH85) reported synthesis of 
twenty- four analogues of a conotoxins GI , Gil, and MI. 
Using the numbering scheme for GI (CYS at positions 2, 

3, 7, and 13), Hashimoto et al . reported alterations at 

4, 8, 10, and 12 that allows the proteins to be toxic. 
Almquist et al . (ALMQ8 9) synthesized [des-GLUi] a. 
Conotoxin GI and twenty analogues. They found that 
substituting GLY for PRO B gave rise to two isomers, 
perhaps related to different disulfide bonding. They 
found a number of substitutions at residues 8 through 11 
that allowed the protein to be toxic. Zafaralla et al . 
(ZAFA88) found that substituting PRO at position 9 gives 
an active protein. Each of the groups cited used only 
in vivo • toxicity as an assay for the activity. From 
such studies, one can infer that an active protein has 
the parental 3D structure, but one can not infer that an 
inactive protein lacks the parental 3D structure. 

Pardi et al . (PARD8 9) determined the 3D structure 
of a Conotoxin GI obtained from venom by NMR. Kobayashi 
et al . (KOBA8 9) have reported a 3D structure of 
synthetic a Conotoxin GI from NMR data which agrees with 
that of PARD89. We refer to Figure 5 of Pardi et. al . . 
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preferred variegation ^° = snal l hydro 

ALA, ™R, LYS, and GLU .small hyd p ^ ^ ^ 

ph iuc p^-f^ 1 ^ • — us protei : 

pre fer f l--« ' ptot eins having varrous 

having ALA, « toxrc ^ ue use an ,,, 

amino acids at posrtron 9 1 ,c,I,P l »,l',I,'.V> 
variegation codon which asVell. 
R , D , G . We use at ^ ^ „ e . Uow ALA, 

^position 14, followmg the £ variegati on 

- ^ I; £ ^oirseCte, encoded by 
al lows 1.053-10 an.no q and 

DNA sequences. ^ respectively, 

« 4-r-ansf orraants wmr 
5.0-10 7 independent transr 
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d „ 95% of the allowed sequences. 
disP lay -70%, -83*, and ~9 ^ Conce rning a 

other variegations are a ^ gray83 , 

• o see, inter 
conotoxins, see, 

GRAY84, and PARD89 . d be one of the 

Th e parental .aini^ofxn y _ IIH by Pease at 

pro teins desisted -Hybrxd-I ^ ^ set 
al . (PE AS90); cf, otein consists of: 

to ^rr —d^ 



Parenta 
^nino__acid_ 

A5 
P6 
E7 
T8 
A9 

A10 

K12 

Q16 



Variegated 
_Codon_ 

"rvt 

VYT 
RRS 
VHG 
VHG 
RMG 
VHG 
NNG 



, AA seqs/ 

Allowed DNAsess 
_Am3Jno__acias . — 

AjD^TjNjS 6/6 

P ?^*V^r G* 7/8 
E7)|N* K » S iTi^ « P 9/9 



- ' — "v. = dFO ID N0:106). 

(R VT.V«.KKS.VH G VHG B* - enc(jded by 

Thl s provid.- 9.55-10' 

DNR sequences. % q£ aU possibl e 

tra „ s£ or m ants aUows al an inc aci, is 

seq uences. « each pos^on, P 

allowed. 
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At position 5 we provide amino acids that are 
compatible with a turn. At position 6 we allow ILE and 
VAL because they have branched £ carbons and make the 
chain ridged. At position 7 we allow ASP, ASN, and SER 
that often appear at the amino termini of helices. At 
positions 8 and 9 we allow several helix- favoring amino 
acids (ALA, LEU, MET, GLN, GLU, and LYS) that have 
differing charges and hydrophobicities because these are 
part of the helix proper. Position 10 is further around 
the edge of the helix, so we allow a smaller set (ALA, 
THR, LYS, and GLU) . This set not only includes 3 helix- 
favoring amino acids plus THR that is well tolerated but 
also allows positive, negative, and neutral hydrophilic. 
The side groups of 12 and 16 project into the same 
region as the residues already recited. At these 
positions we allow a wide variety of amino acids with a 
bias toward helix- favoring amino acids. 

The parental mini -protein may instead be a 
polypeptide composed of residues 9-24 and 31-40 of 
aprotinin and possessing two disulfides (Cys9-Cys22 and 
Cysl4-Cys38) . Such a polypeptide would have the same 
disulfide bond topology as a-conotoxin, and its two 
bridges would have spans of 12 and 17, respectively. 

Residues 23, 24 and 31 are variegated to encode the 
amino acid residue set [G, S , R, D, N, H, P , T, A] so that a 
sequence that favors a turn of the necessary geometry is 
found. We use trypsin or anhydrotrypsin as the affinity 
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molucule to enrich for GPs that display a mini-protein 
that folds into a stable structure similar to BPTI in 
the PI region. 

Three Disulfide Bond Parental Mini -Proteins 

The cone snails ( Conus ) produce venoms (conotoxins) 
which are 10-3 0 amino acids in length and exceptionally 
rich in disulfide bonds. They are therefore archetypal 
mini-proteins. Novel mini- proteins with three 

disulfide bonds may be modelled after the /x- (GIIIA, 
GIIIB, GIIIC) or Q- (GVIA, GVIB, GVIC, GVIIA, GVIIB, 
MVIIA, MVIIB, etc . ) conotoxins. The ji - conotoxins have 
the following conserved structure (SEQ ID NO: 32): 

12 3 1 1 2 1 3 ' 

(2 AAs)-C-C-(5 AAs)-C-(4 AAs) -C- (4 AAs) -C-C-AA 



I I 

No 3D structure of a /i-conotoxin has been 
published. Hidaka et al . (HIDA90) have established the 
connectivity of the disulfides. The following diagram 
depicts geographutoxin I (also known as /z-conotoxin 
GIIIA), whose sequence is SEQ ID NO:33. 



Rl 



\ 



D2 

\ /K16 P17 

C3 : :C15 \ 
| \ Q18 

j \ -R19 1 

C4 : :C20- \ 



/ 



T5 



P6 



\ 



Q14 



/ 

P7 CIO : :C21 R13 

I I I L A22 | 

I / i / 
K8-K9 Kll D12 



The connection from R19 to C20 could go over or under 
the strand from Q14 to C15 . One preferred form of 
variegation is to vary the residues in one loop. 
Because the longest loop contains only five amino acids, 
it is appropriate to also vary the residues connected to 
the cysteines that form the loop. For example, we might 
vary residues 5 through 9 plus 2, 11, 19, and 22. 
Another useful variegation would be to vary residues 11- 
14 and 16-19, each through eight amino acids. 
Concerning \x conotoxins, see BECK8 9b, BECK8 9c, CRUZ 8 9 , 
and HIDA9 0 . 

The Q-conotoxins may be represented as follows (SEQ 
ID NO: 34 through 39) : 

1 2 3 1 1 2 ■ 3 1 

C-{6 AAs)-C-(6 AAs) -C-C- (2-3 AAs) -C- (4-6 AAs) -C 

1 H I I 



The King Kong peptide has the same disulfide arrangement 
as the Q-conotoxins but a different biological activity. 
Woodward et al . (WOOD90) report the sequences of three 
homologuous proteins from C_;_ textile . Within the mature 
toxin domain, only the cysteines are conserved. The 
spacing of the cysteines is exactly conserved, but no 
other position has the same amino acid in all three 
sequences and only a few positions show even pair-wise 
matches. Thus we conclude that all positions (except 
the cysteines) may be substituted freely with a high 
probability that a stable disulfide structure will form. 
Concerning Q conotoxins, see HILL8 9 and SUNX8 7. 

Another mini -protein which may be used as a 
parental binding domain is the Cucurbit a maxima trypsin 
inhibitor I (CMTI-I) ; CMTI-III is also appropriate. 
They are members of the squash family of serine protease 
inhibitors, which also includes inhibitors from summer 
squash, zucchini, and cucumbers (WIEC85) . McWherter et 
al . (MCWH89) describe synthetic sequence-variants of the 
squash- seed protease inhibitors that have affinity for 
human leukocyte elastase and cathepsin G. Of course, 
any member of this family might be used. 

CMTI-I is one of the smallest proteins known, 
comprising only 2 9 amino acids held in a fixed 
comformation by three disulfide bonds. The structure 
has been studied by Bode and colleagues using both X- 
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ray diffraction (BODE8 9) and NMR (HOLA8 9a,b) . CMTI-I is 
of ellipsoidal shape; it lacks helices or fc-sheets, but 
consists of turns and connecting short polypeptide 
stretches. The disulfide pairing is Cys3-Cys2 0, CyslO- 
Cys22 and Cysl6-Cys28. In the CMTI-I : trypsin complex 
studied by Bode et al^, 13 of the 29 inhibitor residues 
are in direct contact with trypsin; most of them are in 
the primary binding segment Val2 (P4) -Glu9 (P4 ' ) which 
contains the reactive site bond Arg5 (PI) -Ile6 and is in 
a conformation observed also for other serine proteinase 
inhibitors . 

CMTI-I has a Ki for trypsin of ~1.5*10" 12 M. 
McWherter et al . suggested substitution of "moderately 
bulky hydrophobic groups" at PI to confer HLE 
specificity. They found that a wider set of residues 
(VAL, ILE, LEU, ALA, PHE, MET, and GLY) gave detectable 
binding to HLE. For cathepsin G, they expected bulky 
(especially aromatic) side groups to be strongly 
preferred. They found that PHE, LEU, MET, and ALA were 
functional by their criteria; they did not test TRP, 
TYR, or HIS. (Note that ALA has the second smallest 
side group available . ) 

A preferred initial variegation strategy would be 
to vary some or all of the residues ARGi, VAL 2 , PR0 4 , 
ARG 5/ ILE 6 , LEU 7 , MET 8 , GLU 9 , LYSn, HIS 25 , GLY 26 , TYR 27 , and 
GLY 29 . If the target were HNE, for example, one could 
synthesize DNA embodying the following possibilities: 
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vg 


Parental___ 


Codon 


ARG1 


VNT 


VAL2 


NWT 


PRO 4 


VYT 


ARG5 


VISIT 


ILE6 


NNK 


LEU 7 


VWG 


TYR27 


NAS 



#AA seqs/ 
Allowed _ ^JDNA_se£S 

all~20 
L,Q>M,K,V,E 



8/8 
6/6 
12/12 
20/31 
6/6 
7/8 



(VW .v»T.m«.viiG »" ° sequence s encoded by 

This allows about 5.81 comp risinc, 5.0-10 

about 1.03-10' DN A sequences. J ^ ^ possible 

lndep endent «— ^c^es — ^ * 
Seq T:l: i—s o £ t h ls —include: 

• h1 M to t I from Citrullus vuj^rxs <oi 
TryP sin inhxbxtor (OTLE87 ) , 

TryP sin inhxbxtor II fro- ^ 
TryP sin inhibitor -; ro f2^T^ OTLE87 ) 

tryp sin inhxbxtor III fr ^7^^ (in OTLE87), 
trypsin inh,bxtor XV om ^ ^ 

trypsin inhxbxtor II — - 0 TLE87), 

^•wv«r- Til from Cucurbxta p_e£u 

inhxbxtor Hi rr _, V11S (in OTLE87), 

TTh fr om Cucumxs sativus 

inhibitor Hb from 0 TLE87), 

tv from Cucumxs sa^xvus 

inhibxtor IV from ^_ „ laterium (FAVE89) , 

• h . h H 0 r II from F^lixum elaterxum 
trypsin xnhxbxtor II 0 TLE87). 
and inhibitor CM-1 from Mo.aordxca repe__ 



trypsin 
trypsin 
trypsin 
trypsin 
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Another mini-protein that may be used as an initial 
potential binding domain is the heat -stable enterotoxins 
derived from some enterotoxogenic coli , Citrobacter 

f reundii , and other bacteria (GUAR89) . These mini- 
proteins are known to be secreted from coli and are 

extremely stable. Works related to synthesis, cloning, 
expression and properties of these proteins include: 
BHAT86, SEKI85, SHIM87, TAKA85, TAKE 9 0 , THOM85a,b, 
YOSH85, D ALL 90 , DWAR8 9, GARI87, GUZM8 9 , GUZM9 0, HOUG84, 
KUB089, KUPE90 , OKAM87, OKAM88, and OKAM90 . 

Another preferred IPBD is crambin or one of its 
homologues, the phoratoxins and ligatoxins (LEC087) . 
These proteins are secreted in plants. The 3D structure 
of crambin has been determined. NMR data on homologues 
indicate that the 3D structure is conserved. Residues 
thought to be on the surface of crambin, phoratoxin, or 
ligatoxin are preferred residues to vary. 

EXAMPLE XV 

A MINI - PROTEIN HAVING A CROSS-LINK CONSISTING OF CU(II) , 
ONE CYSTEINE, TWO HISTIDINES, AND ONE METHIONINE. 

Sequences such as 
HIS-ASN-GLY-MET-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-HIS-ASN-GLY-CYS 
(SEQ ID NO: 40) and 

CYS-ASN-GLY-MET-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-HIS-ASN-GLY-HIS 
(SEQ ID NO:41) are likely to combine with Cu(II) to form 
structures as shown in the diagram: 



416 



Xaa7- 

/ 

Xaa6 



Xaa5 
\ 

MET 4 
GLY3 Cu 



-Xaa8 
\ 

Xaa9 
1 

XaalO 
/ 

HISH 
/ \ 

\ 

ASN12 



Xaa7- 

/ 

Xaa6 



-Xaa8 
\ 

Xaa9 



ASN2-HIS1 CYS14-GLY13 
NH 2 COO 



XaalO 
/ 

HISH 
/ \ 

\J ASN12 

I / \ 1 

ASN2-CYS1 HIS14-GLY13 

I \ 
NH 2 COO 



Xaa5 
\ 

MET 4 
/ \ 

/ 

GLY3 
I 



NH2 COO 

f HIS MET , HIS, and CYS along the 
othe r arrangers of H - . tructu ~.. 

chain are also Ixkely 3 and at 

amlno ^ - j;— acids that c _ ^ 

positionB 12 and 13 g» flex ibility for them to 

metal-binding Uganda enoug ^ conn eoting 

come together and bind t e ^ 

se<3 uenoe S may be used ^ ^ ^ „ also 

G „-C^, or residue3 in th e loops that 

POS3ibl ; or the third and £ o U rth —X- 

join the first ana NO: 42) , 

■*„«q For example (SEU 
binding residues. 
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Xaa8 



-Xaa9 



/ 



Xaa7 



XaalO 



Xaa6 Xaall 



\ / 



-MET5 HIS12 



Xaa4 



\ / \ 
\ / \ 



PRO 3 



Cu ASN13 



\ / \ 



GLY2-HIS1 CYS 1 5— GLY1 4 



NH 2 



COO 



is likely to form the diagrammed structure for a wide 
variety of amino acids at Xaa4 . It is expected that the 
side groups of Xaa4 and Xaa6 will be close together and 
on the surface of the mini-protein. 

The variable amino acids are held so that they have 
limited flexibility. This cross-linkage has some 

differences from the disulfide linkage. The separation 
between C ff 4 and C^n is greater than the separation of the 
C a s of a cystine. In addition, the interaction of 
residues 1 through 4 and 11 through 14 with the metal 
ion are expected to limit the motion of residues 5 
through 10 more than a disulfide between rsidues 4 and 
11. A single disulfide bond exerts strong distance 
constrains on the a carbons of the joined residues, but 
very little directional constraint on, for example, the 
vector from N to C in the main-chain. 

For the desired sequence, the side groups of 
residues 5 through 10 can form specific interactions 



418 



t Other nutters of variable amino acids, 
wi th the target. Othe ^ ^ ^ appropriate . Larger 

for example, 4, 5, . ed se quence contains 

, ^ .cod when tne eue " LW 
spans may be used tial to form a helices or 

segments having a high po e orma tional 

Atlriarv structure that limits cue 
other secondary scr Whereas a mini- 

- ^ r Ui : fOOT three distinct 

prot ein having f our ^ ^ ^ _ MET, and 

pa irin g s, a m ^pro« ^ compiexes with cu . 

one CYS can for™ only synmetry 
These two structures are ^ ^ 

Pll Because tne 

through the cu. different. 

. ,k» structures are ditierci 

distinguishable, the str otein s are dis 

— such .etal-cont a.n.ng -*» ^ ^ 

p l ay ed on £i Xa.entous ^ o£ the approp riate 

iririr/r^c-ee.osedtothe.etaion. 

o^arated from the cells, 
after they are separated 

EXAMPLE XVI 

, «- * CO— O* «««, 

»" FOUR CYSTEIHES ^ xv 

A cross link sxmUar to the (GIBS88 , 
i8 e XemP U £ ied hV - --"^-r el £am ily o £ 
„, P—B, ™ m l^l°\ ls r esidues in 

zinc . fi „ g er S has two FRW)8? _ 

conserved positrons that b (gibsb8) 
CHOW87, EVAK88, BERG8 CHAV88) ^ 

, ^ of sequences thougnc 
review a nu^er of J nal mode l for these 

fingers and propose a three 
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compounds. Most of these sequences have two CYS and two 
HIS residues in conserved positions, but some have three 
CYS and one HIS residue. Gauss et al . (GAUS87) also 
report a zinc-finger protein having three CYS and one 
HIS residues that bind zinc. Hard et al . (HARD90) 
report the 3D structure of a protein that comprises two 
zihc-f ingers , each of which has four CYS residues. All 
of these zinc-binding proteins are stable in the 
reducing intracellular environment . 

One preferred example of a CYS:: zinc cross linked 
mini-protein comprises residues 440 to 461 of the 
sequence shown in Figure 1 of HARD 9 0 . The resiudes 444 
through 4 56 (SEQ ID NO: 43) may be variegated. One such 
variegation is as follows: 

Parental Allowed #AA / #DNA 



SER444 


SER, 


ALA 








2 


/ 


2 


ASP445 


ASP, 


ASN, 


GLU, 


LYS 




4 


/ 


4 


GLU44 6 


GLU, 


LYS, 


GLN 






3 


/ 


3 


ALA44 7 


ALA, 


THR, 


GLY, 


SER 




4 


/ 


4 


SER448 


SER, 


ALA 








2 


/ 


2 


GLY44 9 


GLY, 


SER, 


ASN, 


ASP 




4 


/ 


4 


CYS4 50 


CYS, 


PHE, 


ARG, 


LEU 




4 


/ 


4 


HIS451 


HIS, 


GLN, 


ASN, 


LYS, 


ASP, 


GLU 6 


/ 


6 


TYR452 


TYR, 


PHE, 


HIS, 


LEU 




4 


/ 


4 


GLY4 53 


GLY, 


SER, 


ASN, 


ASP 




4 


/ 


4 


VAL4 54 


VAL, 


ALA, 


ASP, 


GLY, 


SER, 


ASN, THR, ILE 
8 


/ 


8 


LEU4 55 


LEU, 


HIS, 


ASP, 


VAL 




4 


/ 


4 


THR4 56 


THR, 


ILE, 


ASN, 


SER 




4 


/ 


4 



This leads to 3.77-10 7 DNA sequences that encode the same 
number of amino-acid sequences. A library having 1.0-10 8 
indepentent transf ormants will display 93% of the 
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allowed sequences; 2.0-10 independent trans formants will 
display 99.5% of allowed sequences. 
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Table 1: Single-letter codes. 
Single-letter code is used for proteins : 



a 


= ALA c 




CYS d 


= ASP e = GLU f = 


PHE 


g 


= GLY h 




HIS i 


= ILE k = LYS 1 = 


LEU 


m 


- MET n 




ASN p 


= PRO q = GLN r = 


ARG 


s 


= SER t 




THR v 


= VAL w = TRP y = 


TYR 




= STOP 






= any amino acid 




b 


= n or 


d 








2 


= e or 


q 









x = any amino acid 



15 

Single-letter IUB codes for DNA : 





T, 


c, 


A, G stand 


for themselves 


20 


M 


for 


A or C 








R 


for 


puRines A 


or G 






W 


for 


A or T 








S 


for 


C or G 








Y 


for 


pYrimidines 


T or 


C 


25 


K 


for 


G or T 








V 


for 


A, C, or G 


(not 


T) 




H 


for 


A, C, or T 


(not 


G) 




D 


for 


A, G, or T 


(not 


C) 


30 


B 


for 


C, G, or T 


(not 


A) 



N for any base . 
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Tabl e 2= Preferred Outer-Surface Protein. 

Preferred 

outer-Surface ReaS onfor_Hi5H^ 

^eg^ 

coat protein ( gp viII) 

b) predictable post- 
translational 
processing, 

c) numerous copies in 
virion. 

^___d)_J^ipj L _d^ 

a) fusion data available^ 
S amino terminus exposed, 
c) working example 

available_ : 



20 PhiX174 



gp III 



G protein 



a) known to be on virion 
exterior, 

b) small enough that 
the G^ipbd gene can 

^ 



25 



30 



E. coli 



35 



40 



LamB 
OmpC 
OmpA. 

OmpF 
PhoE 



B. sjibtilis^ 
45 spores 



CotC 



a) fusion data available, 
_b)_Jion^ejsentia2^_ 

a) topological model 

b) n on-essential; abundant 

a) topological model 

S nonessential; 

c) homologues m other g 

a) topological model 

b) nonessential; abundant 

a) topological model 

b) non-essential; abundant 

c) inducible 

a) no post-translational 
processing, 

b ) distinctive Sequence 
that causes protein to 
localize in spore coat, 
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Table 3 : Ambiguous DNA for AA_seq2 



m 

1 

A.T.G 



a 
9 

G.C.n 



v 
17 
G.T.n 



P 
25 
C.C.n 



Y 
33 
T. A.y 



k 
2 

A. A. r 



s 

10 
T . C . n 
A.G.y 

P 
18 
C.C.n 



d 
26 
G. A.y 



t 

34 
A.C.n 



k 

3 

A. A. r 



v 
11 
G.T.n 



m 
19 
A.T.G 



f 

27 
T.T.y 



9 
35 
G.G.n 



s 
4 

T.C.n 
A.G.y 

a 
12 
G.C.n 



1 

20 
T .T . r 
C.T.n 

c 

28 
T.G.y 



P 
36 
C.C.n 



1 

5 

T.T.r 
C.T.n 

v 
13 
G.T.n 



s 

21 
T.C.n 
A.G.y 

1 

29 
T.T.r 
C.T.n 

c 

37 
T.G.y 



v 
6 

G.T. 



n 



a 
14 
G.C.n 



f 

22 

.T.y 



e 

30 
G.A.r 



k 
38 
A. A. r 



1 
7 

T.T.r 
C.T.n 

t 

15 
A.C.n 



a 
23 
G.C.n 



P 
31 
C.C.n 



a 

39 
G.C.n 



k 
8 

A.A.r 



1 

16 
T.T.r 
C.T.n 

r 

24 
C.G.n 
A.G.r 

P 
32 
C.C.n 



r 

40 
C.G.n 
A.G.r 



10 



i 

41 
A.T.h 

k 
49 
A.A.r 



v 
57 
G.T.n 



i 

42 
A.T.h 

a 
50 
G.C.n 



y 

58 
T.A.y 



r 
43 
C.G.n 

9 
51 
G.G.n 



g 

59 
G.G.n 



y 

44 
T.A.y 

1 

52 
T.T.r 
C.T.n 

g 

60 
G.G.n 



f 

45 
T.T.y 

c 

53 
T.G.y 



c 

61 
T.G.y 



y 

46 
T.A.y 

q 

54 
C.A.r 



r 
62 
C.G.n 
A.G.r 



n 
47 
A. A.y 

t 

55 
A.C.n 



a 
63 
G.C.n 



a 

48 
G.C.n 

f 

56 
T.T.y 



k 
64 
A.A.r 
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Table 3, continued. 



10 



r 
65 
C.G.n 
A.G.r 

d 
73 
G.A.y 

a 

81 
G.C.n 

k 
89 
A. A. r 



a 
97 
G.C .n 



Y 
105 
T. A.y 

i 
113 
A.T.h 

k 
121 
A. A. r 



k 
129 
A. A. r 



n 
66 
A. A.y 



c 

74 
T.G.y 

a 
82 
G.C.n 

a 
90 
G.C.n 



s 

98 
T.C.n 
A.G.y 

a 
106 
G.C.n 

v 
114 
G.T.n 

1 
122 
T.T.r 
C .T.n 

a 
130 
G.C.n 



n 
67 
A. A.y 



m 
75 
A.T.G 

e 

83 
G.A.r 

a 
91 
G.C.n 



a 

99 
G.C.n 



w 
107 
T.G.G 

9 
115 
G.G.n 

f 
123 
T . T . y 



s 
131 
T.C.n 
A.G.y 



f 

68 
T. T ,y 



r 
76 
C.G.n 

9 
84 
G.G.n 

f 

92 
T . T .y 



t 
100 
A.C.n 



a 
108 
G.C.n 

a 
116 
G.C.n 

k 
124 
A. A. r 



132 
T.A. r 
T.G.A 



k 
69 
A.A.r 



t 

77 
A.C.n 

d 
85 
G.A.y 

N 
93 
A. A.y 



e 
101 
G.A.r 



m 
109 
A.T.G 

t 
117 
A.C.n 

k 
125 
A.A.r 



133 
T.A.r 
T.G.A 



s 

70 
T.C.n 
A.G.y 

c 
78 
T.G.y 

d 
86 
G.A.y 

s 

94 
T.C.n 
A.G.y 

y 

102 
T . A.y 



v 
110 
G.T.n 

i 
118 
A.T.h 

f 
126 
T .T .y 



134 
T.A.r 
T.G.A 



a 
71 
G.C.n 



g 

79 
G.G.n 

P 
87 
C.C.n 

1 

95 
T.T.r 
C.T.n 

i 
103 
A.T.h 



v 
111 
G.T.n 

g 

119 
G.G.n 

t 
127 
A.C.n 



e 
72 
G.A.r 



g 

80 
G.G.n 

a 
88 
G.C.n 

q 

96 
C.A.r 



g 

104 
G.G.n 



v 
112 
G.T.n 

i 
120 
A.T.h 

s 
128 
T.C.n 
A.G.y 
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Table 4: Table of Restriction Enzyme Suppliers 



Suppliers : 
5 Sigma Chemical Co. 

P.O.Box 14508 

St. Louis, Mo. 63178 

Bethesda Research Laboratories 
10 P.O.Box 6009 

Gaithersburg, Maryland, 20877 

Boehringer Mannheim Biochemicals 
7941 Castleway Drive 
15 Indianapolis, Indiana, 46250 

International Biochemicals, Inc. 
P.O.Box 9558 

New Haven, Connecticutt , 06535 

20 

New England BioLabs 
32 Tozer Road 

Beverly, Massachusetts, 01915 

2 5 Promega 

2800 S. Fish Hatchery Road 
Madison, Wisconsin, 53711 



30 



Stratagene Cloning Systems 
110 9 9 North Torrey Pines Road 
La Jolla, California, 92037 
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Table 5: Potential sites in ipbd gene. 
Summary of cuts 



10 



15 



20 



25 



30 



Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
386 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
383 
Enz 
Enz 
Enz 



%Acc I has 3 elective sites 
Af 1 II has 1 elective sites 
Apa I has 2 elective sites 
Asu II has 1 elective sites 
has 



Ava III 



elective sites 
elective sites 
elective sites 
elective sites 
elective sites 

elective sites 

+ Esp I has 2 elective sites 
Hind III has 6 elective sites 



= BspM II has 
= BssH II has 
- % BstX I has 
= + Dra II has 
= + EcoN I has 
I 



1 

1 
2 
1 
3 
2 



= Kpn 
= Mlu 
= Nar 
= Nco 

- Nhe 
= Nru 
= + Pf 1M 
= PmaC 

- + PpuM 
= +Rsr 
= +Sf i 
= Spe I 
= Sph I 
= Stu I 
= % Sty 



has 1 elective sites 
has 1 elective sites 
has 2 elective sites 
has 1 elective sites 
has 3 elective sites 
has 2 elective sites 
I has 1 elective sites 

I has 1 elective sites 

I has 2 elective sites 

II has 1 elective sites 
I has 2 elective sites 

has 3 elective sites 
has 1 elective sites 
has 5 elective sites 
I has 6 elective sites 



= Xba I has 1 elective sites 
= Xho I has 1 elective sites 
= Xma III has 3 elective sites 



96 169 281 
19 
102 103 
381 

314 

72 

67 115 
323 

102 103 226 
62 94 
57 187 
: 9 23 60 287 361 



48 

314 

238 

323 

25 2 

38 6 
94 
228 
10 
10 
24 

12 4 

221 

23 7 
11 



343 

89 388 
5 



2 226 
2 

261 
5 379 

0 150 287 386 
44 143 263 323 



84 
85 



70 209 242 



Enzymes not cutting ipbd 



Avr II 
EcoR I 
Sac I 
Xma I 



BamH I 
EcoR V 
Sal I 



Bel 
Hpa 
Sau 



BstE II 
"i 
I 



Not 
Sma 
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^..«--.--£." lilw - ln * 11 "' 

HEADER — — 2LZM 

Coordinates fro. Broofchaven Prote.n 
Only Molecule A was considered. 
FE H^RaLASBlO-OLYCOSTL, ^ 

„ J HOGLE , S Rtonic radil in T a b le 

Solvent radius = 

7 . 

Surface area measured in A 2 • 

Max 

10 

TYPS N <area> sigma max m £po B ed (fraction)_ _ 
''"VaYl 85 - 1( °' 40) 

,-, n 1 47 214.3 207.1 &) 

25 ALA2 o 2 8 3 56 245.5 234.4 ^ Q . 47) 

CYS \n 2711 5.36 281.4 262 5 1 4) 

ASP X l 111 2 5.78 304.9 285.4 8(0 .32> 

GLU 10 297.2 325 _ 4 30 7.5 * } 

PRE 8 316.6 5.9 ig8 _ 3 183i3 91-9 

30 GLY 23 185.5 ^ 294 5 32 -| q _ ^ 

HIS c 278 1 3.61 285.6 269.6 q48) 
ILE 16 278 1 3219 300 1 1 39) 

LYS 1! 282 6 6.75 304.0 269.8 1 q>30) 

LEU 24 282.6 299>5 283 1 53) 

35 ,1 273 0 5.75 285.1 262.6 1 Q . 54) 

ASN 26 273° 242<1 234.6 (0 . 49 ) 

PR ° l 299 5 4.75 305.8 291.5 1 q1Q) 

GLN ,! 344 7 8.66 355.8 326.7 (0 . 43 ) 

ARG 24 344./ 2 36.6 223.3 } 

40 SER 16 228 6 3 59 ^ ^0.44) 

THR 18 . 254 3 4.05 261-8 245.7 11 28) 

VALl o 359 4 3.38 366.4 355.1 (0 . 22) 

TRP 9 ^8 4.97 342.0 325.0 

TYR 9 335. b 
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Table 7: Atomic radii 

© 

A_ 

C a 1.70 

Ocarbonyl 1.52 
Namide 1.55 

Other atoms 1.8 0 



10 



Table 8 

15 Fraction of DNA molecules having 

n non-parental bases when 
reagents that have fraction 
M of parental nucleotode. 

20 



M 


. 9965 


. 97716 


. 92612 


. 8577 


.79433 


.63096 


f 0 


. 9000 


. 5000 


. 1000 


. 0100 


. 0010 


. 000001 


fl 


.09499 


.35061 


.2393 


. 04977 


. 00777 


.0000175 


f2 


. 00485 


. 1188 


.2768 


. 1197 


. 0292 


. 000149 


f3 


. 00016 


. 0259 


.2061 


. 1854 


. 0705 


. 000812 


f4 . 


000004 


.00409 


. 1110 


.2077 


. 1232 


. 003207 


f8 


0 . 


2 • 10" 7 


. 00096 


. 0336 


. 1182 


. 080165 


f 16 


0 . 


0 . 


0 . 


5 - 10~ 7 


. 00006 


. 027281 


f 23 


0 . 


0 . 


0 . 


0 . 


0 . 


. 0000089 


most 


0 


0 


2 


5 


7 


12 



35 

"most" is the value of n having the highest 
probability. 
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Table 9: best vgCodon 



5 Program "Find Optimum vgCodon. " 

INITIALIZE -MEMORY -OF -ABUNDANCES 
DO ( tl - 0.21 to 0.31 in steps of 0.01 ) 
. DO ( cl = 0.13 to 0.23 in steps of 0.01 ) 
. . DO ( al = 0.23 to 0.33 in steps of 0.01 ) 
10 Comment calculate gl from other concentrations 

. . . gl = 1.0 - tl - cl - al 
. . . IF ( gl .ge . 0 . 15 ) 

. . . . DO ( a2 = 0.37 to 0.50 in steps of 0.01 ) 
DO ( c2 = 0.12 to 0.20 in steps of 0.01 

15 ) 

Comment Force D+E = R + K 

g2 = (gl*a2 - . 5*al*a2) / (cl+0 . 5*al) 

Comment Calc t2 from other concentrations. 

t2 = 1. - a2 - c2 - g2 

20 IF(g2.gt. 0.1. and. t2.gt.0.1) 

CALCULATE - ABUNDANCES 

COMPARE -ABUNDANCES - TO - PREVIOUS -ONES 

end_IF_block 

end_DO_l oop ! c 2 

2 5 end_DO_loop ! a2 

end_IF_block ! if gl big enough 

. . . . end_DO_loop ! al 

. . .end_DO_loop ! cl 

. . end_DO_loop ! tl 
30 WRITE the best distribution and the abundances. 
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Table 10: Abundances obtained 
from various vgCodons 

A. Optimized qfk Codon, Restrained by [D] + [E] 
[K] + [R] 





T 


C 


A 


G 




1 


.26 


. 18 


.26 


.30 


q 


2 


.22 


. 16 


.40 


.22 


f 


3 


. 5 


. 0 


. 0 


. 5 


k 



10 



15 



Amino 

acid Abundance 



20 



A 


4.80% 


D 


6.00% 


F 


2 . 86% 


H 


3 . 60% 


K 


5.20% 


M 


2 . 86% 


P 


2 . 88% 


R 


6.82% 


T 


4.16% 


W 


2 .86% 


stop 


5.20% 



Amino 
acid 



C 
E 
G 
I 
L 
N 

Q 
S 



V 

y 



Abundance 



86% 
00% 
60% 
86% 
82% 
20% 
60% 

02% mfaa 



6 .60% 
5.20% 



25 



[D] + [E] = [K] + [R] = .12 
ratio = Abun(W) /Abun(S) = 0.4074 



2 (1/ratio) j 

1 2.454 

2 6.025 
30 3 14.788 

4 36.298 

5 89.095 

6 218.7 

7 536.8 



(ratio) j 

.4074 

. 1660 

. 0676 

. 0275 

. 0112 
4 . 57 • 10-3 
1.86-10-3 



stop- free 
. 9480 
.8987 
. 8520 
.8077 
. 7657 
. 7258 
. 6881 
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Table 10: Abundances obtained 
from various vgCodon 
(continued) 

5 B. Unrestrained, optimized 





T 


C 


A 


G 


1 


.27 


. 19 


.27 


.27 


2 


.21 


. 15 


.43 


.21 


3 


. 5 


. 0 


. 0 


. 5 



Amino 




Amino 




acid 


Abundance 


acid 


Abundance 


A 


4 . 05% 


C 


2 . 84% 


D 


5 . 81% 


E 


5.81% 


F 


2 . 84% 


G 


5 . 67% 


H 


4 . 08% 


I 


2 . 84% 


K 


5 . 81% 


L 


6 . 83% 


M 


2 . 84% 


N 


5 .81% 


P 


2 .85% 


Q 


4 . 08% 


R 


6 . 83% 


s 


6.89% mfaa 


T 


4 . 05% 


V 


5 . 67% 


W 


2.84% lfaa 


Y 


5 . 81% 


stop 


5 . 81% 






[D] + 


[E] = 0.1162 


[K] + [R] = 


0 . 1264 



25 ratio = Abun (W) /Abun (S) = 0.41176 



1 


(1/ratio) j 


(ratio) j 


stop- free 


1 


2 .4286 


.41176 


. 9419 


2 


5.8981 


. 16955 


.8872 


3 


14 . 3241 


. 06981 


. 8356 


4 


34 .7875 


. 02875 


. 7871 


5 


84 .4849 


. 011836 


. 74135 


6 


205.180 


. 004874 


. 69828 


7 


498 .3 


2.007-10" 3 


. 6577 
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Table 10: Abundances obtained 
from various vgCodon 
(continued) 



C. Optimized NNT 



30 



stop-free 

1 

2 

3 

4 

5 

6 

7 



2 
4 
8 
16 



32 . 0 
64 . 0 
128.0 







T C A 




G 




1 


.2071 .2929 .2071 




.2929 




2 


.2929 .2071 .2929 




.2071 




3 


1 . .0 .0 .0 






10 


Amino 


Amino 








acid 


Abundance acid 




Abundance 




A 


6 . 06% 


C 


4 .29% lfaa 




D 


8 . 58% 


E 


none 




F 


6 . 06% 


G 


6.06% 


15 


H 


8 . 58% 


1 


6 . 06% 




K 


none 


L 


8.58% 




M 


none 


N 


6.06% 




P 


6.06% 


Q 


none 




R 


6 . 06% 


S 


8 . 5 8 *s 


20 


T 


4.29% lfaa 


V 


8 .58% 




W 


none 


Y 


6.06% 




stop none 






25 


i 


(1/ratio) j 


(ratio) j 



. 5 

.25 

. 125 

. 0625 

. 03125 

. 015625 

. 0078125 



1 
1 
1 
1 
1 
1 
1 
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Table 10: Abundances obtained 
from various vgCodon 
(continued) 

5 

D. Optimized NNG 

T C A Q 

.23 .21 .23 .33 

.215 .285 .285 .215 

.0 .0 .0 1.0 



1 
2 
3 



Amino Amino 



acid 


Abundance 


acid 


Abundance 


A 


9.40% 


C 


none 


D 


none 


E 


9 .40% 


F 


none 


G 


7 . 10% 


H 


none 


I 


none 


K 


6 . 60% 


L 


9.5 0% mfaa 


M 


4 . 90% 


N 


none 


P 


6 .00% 


Q 


6 . 00% 


R 


9.50% 


S 


6.60% 


T 


6.6 % 


V 


7 . 10% 


W 


4.90% lfaa 


Y 


none 


6 . 60% 








i 


(1/ratio) j 


(ratio) 3 


stop-free 


1 


1 . 9388 


. 51579 


0 . 934 


2 


3 . 7588 


.26604 


0 . 8723 


3 


7 .2876 


. 13722 


0 . 8148 


4 


14 . 1289 


. 07078 


0 . 7610 


5 


27 .3929 


3.65-10" 2 


0 . 7108 


6 


53 . 109 


1.88- 10" 2 


0 . 6639 


7 


102 . 96 


9 . 72 • 10" 3 


0 . 6200 



434 



Table 10: Abundances obtained 
from optimum vgCodon 
(continued) 

5 

E. Unoptimized NNS (NNK gives identical distribution) 





T 


C 


A 


G 


1 


.25 


.25 


.25 


.25 


2 


.25 


.25 


.25 


.25 


3 


. 0 


. 0 


. 0 


0 . 5 



10 



Amino 
acid 
A 
D 
F 
H 
K 
M 
P 
R 
T 
W 

stop 



Abundance 
6 .25% 
3 . 125 
3 . 125 
3 . 125 
3 . 125 
3 . 125 
6 .25% 
9.375 
6.25% 
3 . 125% 
3 . 125% 



Amino 
acid 
C 
E 
G 
I 
L 
N 

Q 
S 
V 
Y 



Abundance 
3 . 125% 
3 . 125% 
6.25% 
3 . 125% 
9.375% 
3 . 125% 
3 . 125% 
9.375% 
6.25% 
3 . 125% 



1 
1 
2 
3 
4 
5 
6 
7 



(1/ratio) j 
3 . 0 
9.0 
27 . 0 
81.0* 
243 . 0 
729 . 0 
2187 . 0 



(ratio) 3 
. 33333 
. 11111 
. 03704 
. 01234567 
. 0041152 
1.37.10" 3 
4.57-10" 4 



stop-free 
96875 
93853 
90915 
8807 
8532 
82655 
8007 
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Table 11: Calculate worst codon. 

Program "Find worst vgCodon within Serr of given 

distribution. " 
INITIALIZE -MEMORY- OF -ABUNDANCES 
Comment .Serr is % error level . 
READ Serr 

Comment Tli , Cli , Ali , Gli , T2i , C2i , A2i , G2i , 1131,031 
Comment are the intended nt -distribution . 

READ Tli, Cli, Ali, Gli 

READ T2i, C2i, A2i, G2i 

READ T3i, G3i 

Fdwn = 1 . -Serr 

Fup = 1 . +Serr 

DO ( tl = Tli*Fdwn to Tli*Fup in 7 steps) 
. DO ( cl = Cli*Fdwn to Cli*Fup in 7 steps) 
. . DO ( al = Ali*Fdwn to Ali*Fup in 7 steps) 
. . . gl = 1. - tl - cl - al 
. . . IF( (gl-Gli)/Gli .It. -Serr) 
Comment gl too far below Gli, push it back 
. . . . gl = Gli*Fdwn 

.... factor = (l.-gl)/(tl + cl + al) 
. . . . tl = tl*factor 
. . . . cl = cl*factor 
. . " . . al = al*factor 

end_IF_block 

. . . IF( (gl-Gli)/Gli .gt. Serr) 
Comment gl too far above Gli, push it back 
. . . . gl = Gli*Fup 

.... factor = (l.-gl)/(tl + cl + al) 
. . . . tl = tl*factor 
. . . . cl = cl*factor 
. . . . al = al*factor 
end_IF_block 

. . . DO ( a2 = A2i*Fdwn to A2i*Fup in 7 steps) 
. . . . DO ( c2 = C2i*Fdwn to C2i*Fup in 7 steps) 
DO (g2=G2i*Fdwn to G2i*Fup in 7 steps) 

Comment Calc t2 from other concentrations. 

t2 = 1. - a2 - c2 - g2 

IF( (t2-T2i)/T2i .It. -Serr) 

Comment t2 too far below T2i, push it back 
t2 = T2i*Fdwn 

factor = (l.-t2)/(a2 + c2 + g2) 
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Table 11, continued 

a2 = a2*factor 

c2 = c2*factor 

g2 = g2*factor 

end_IF_block 

IF( (t2-T2i)/T2i .gt. Serr) 

Comment t2 too far above T2i, push it back 
t2 = T2i*Fup 

. . factor = (l.-t2)/(a2 + c2 + g2) 

a2 = a2*factor 

c2 = c2*factor 

g2 = g2*factor 

end_IF_block 

IF(g2.gt. 0.0 .and. t2.gt.0.0) 

t3 = 0.5* (1. -Serr) 

g3 = 1. - t3 

CALCULATE -ABUNDANCES 

COMPARE -ABUNDANCES - TO - PREVIOUS - ONES 

t3 = 0.5 

g3 = 1. - t3 

CALCULATE -ABUNDANCES 

COMPARE -ABUNDANCES - TO - PREVIOUS - ONES 

t3 = 0.5* (l.+Serr) 

g3 = 1. - t3 

CALCULATE -ABUNDANCES 

COMPARE -ABUNDANCES -TO - PREVIOUS - ONES 

end_IF_block 

end_DO_loop ! g2 

end_DO_loop ! c2 

end_DO_loop ! a2 

. . . .end_DO_loop ! al 
. . . end_DO_loop ! cl 
. . end_DO__loop ! tl 

WRITE the WORST distribution and the abundances. 
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Table 12 : Abundances obtained 
using optimum vgCodon assuming 
5% errors 



Amino Amino 

acid Abundance acid Abundance 

A 4.59% C 2.76% 

D 5.45% E 6.02% 

F 2.49% Ifaa G 6.63% 

H 3.59% I 2.71% 

K 5.73% L 6.71% 

M 3.00% N 5.19% 

P 3.02% Q 3.97% 

R 7.68% mfaa S 7.01% 

T 4.37% V 6.00% 

W 3.05% Y 4.77% 

stop 5.27% 



ratio = Abun(F) Abun(R) = 0.3248 

5 



j 


(1/ratio) j 


(ratio) j 


stop- free 


1 


3 . 079 


.3248 


. 9473 


2 


9.481 


. 1055 


. 8973 


3 


29 . 193 


. 03425 


. 8500 


4 


89 . 888 


. 01112 


. 8052 


5 


276 . 78 


.3 .61. 10" 3 


. 7627 


6 


852 . 22 


1 . 17 . 10" 3 


. 7225 


7 


2624 . 1 


3.81-10" 4 


. 6844 
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Table 13: BPTI Homologues 
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Table 13, continued 



R# 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


38 


C 


T 


A 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


39 


R 


R 


R 


Q 


R 


R 


R 


R 


R 


R 


R 


G 


G 


G 


G 


G 


K 


R 


G 


40 


A 


A 


A 


G 


A 


A 


A 


A 


A 


A 


A 


G 


G 


G 


G 


G 


G 


G 


G 


41 


K 


K 


K 


N 


K 


K 


K 


K 


K 


K 


K 


N 


N 


N 


N 


N 


N 


N 


N 


42 


R 


R 


R 


N 


S 


R 


R 


R 


R 


R 


R 


S 


A 


A 


A 


A 


K 


Q 


A 


43 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


44 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


R 


R 


R 


R 


N 


N 


R 


R 


45 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


46 


K 


K 


K 


E 


K 


K 


K 


K 


K 


K 


K 


K 


K 


K 


K 


E 


K 


D 


K 


47 


S 


S 


S 


T 


S 


S 


S 


S 


S 


S 


S 


T 


T 


T 


T 


m 
X 


T 


T 


T 


48 


A 


A 


A 


T 


A 


A 


A 


A 


A 


A 


A 


I 


I 


I 


I 


R 


K 


T 


I 


49 


E 


E 


E 


E 


E 


E 


E 


E 


E 


E 


E 


E 


E 


D 


D 


D 


A 


Q 


E 


50 


D 


D 


D 


M 


D 


D 


D 


D 


D 


D 


D 


E 


E 


E 


E 


E 


E 


Q 


E 


51 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


52 


M 


M 


M 


L 


M 


M 


M 


M 


M 


M 


E 


R 


R 


R 


H 


R 


V 


Q 


R 


53 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


R 


E 


R 


G 


R 


54 


T 


T 


T 


I 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


T 


A 


V 


T 


55 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


56 


G 


G 


G 


E 


G 


G 


G 


G 


G 


G 


G 


I 


V 


V 


V 


G 


R 


V 


V 


57 


G 


G 


G 


P 


G 


G 


G 


G 


G 


G 


G 


R 


G 


G 


G 


G 


P 




G 


58 


A 


A 


A 


P 


A 


A 


A 


A 


A 


A 


A 


K 








K 


P 






59 








Q 


























E 






60 








Q 


























R 






61 








T 


























P 






62 








D 
































63 








K 
































64 








S 

































I BPTI (SEQ ID NO: 44) 

5 2 Engineered BPTI From MARK8 7 ( Stfg lb MflM|Q 

3 Engineered BPTI From MARK8 7 *>gft tf% H** 

4 Bovine Colostrum (DUFT85) (%CCt lO MQt M7 ^ 

5 Bovine Serum (DUFT85) CStTolb M0*M^ 

6 Semisynthetic BPTI, TSCH87 $>gQ Ift MO: V*) 
10 7 Semisynthetic BPTI, TSCH87 f*C«l NO'.fftp 

8 Semisynthetic BPTI, TSCH87 tseoi , *Q MO! <3l\ 

9 Semisynthetic BPTI, TSCH87 fcfrgfr y fr ^No: 

10 emi synthetic BPTI, TSCH87 f ^ l ft gg^ 

II Engineered BPTI, AUER87 tS(C$t lb Hi/y 'fr q^ 

15 12 Dendroaspis polylepis polylepis (Black mainba) venom I 

(DUFT85) (Sgq ^ MfKCSj 



13 
14 



15 
16 
17 

10 
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Table 13, continued 



gendroasEi^olxleEi^l^ (BlaOc «a*a, veno m K 

(DUFT85) Cjeg \[> VO'.S*) mV^HHV II 

^^uB^^hB^BB (Ringhal b Cobra) HHV 

(DUFT85) ^ (DUFT85) aggJDNfllfctL 

Vipera russelli (Russel s v p g fQ (PKtC K,a_l ~" 

iid^el turtle egg whxte * } < iN^.fe^ 

18 Snail mucus ^^^^^^^^^ 

19 gfg^^!?5s^ 
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Table 13 : BPTI Homologues (continued) 



# 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


31 


32 


33 


34 


35 
































































































T 


P 


- 


- 


-2 


Z 


. - 


L 


Z 


R 


K 


- 


- 


- 


R 


R 


- 


E 


T 


- 


- 


-1 


P 


- 


Q 


D 


D 


N 


- 


- 


- 


Q 


K 


- 


R 


T 


- 


- 


1 


R 


R 


H 


H 


R 


R 


I 


K 


T 


R 


R 


R 


G 


D 


K 


T 


2 


R 


P 


R 


P 


P 


P 


N 


E 


V 


H 


H 


P 


F 


L 


A 


V 


3 


K 


Y 


T 


K 


K 


T 


G 


D 


A 


R 


P 


D 


L 


P 


D 


E 


4 


L 


A 


F 


F 


F 


F 


D 


S 


A 


D 


D 


F 


D 


I 


S 


A 


5 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


6 


I 


E 


K 


Y 


Y 


N 


E 


Q 


N 


D 


D 


L 


T 


E 


Q 


N 


7 


L 


L 


L 


L 


L 


L 


L 


L 


L 


K 


K 


E 


S 


Q 


L 


L 


8 


H 


I 


P 


P 


P 


L 


P 


G 


P 


P 


P 


P 


P 


A 


D 


P 


9 


R 


V 


A 


A 


A 


P 


K 


Y 


V 


P 


P 


P 


P 


FG 


Y 


I 


10 


N 


A 


E 


D 


D 


E 


V 


S' 


I 


D 


D 


Y 


V 


D 


S 


V 


11 


P 


A 


P 


P 


P 


T 


V 


A 


R 


K 


T 


T 


T 


A 


Q 


Q 


12 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


K 


G 


G 


G 


G 


G 


13 


R 


P 


P 


R 


R 


R 


P 


P 


P 


N 


I 


P 


P 


L 


P 


P 


14 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


15 


Y 


M 


K 


K 


L 


N 


R 


M 


R 


- 


- 


K 


R 


F 


L 


R 


16 


D 


F 


A 


A 


A 


A 


A 


G 


A 


G 


Q 


A 


A 


G 


G 


A 


17 


K 


F 


S 


H 


Y 


L 


R 


M 


F 


P 


T 


K 


G 


Y 


L 


F 


18 


I 


I 


I 


I 


M 


I 


F 


T 


I 


V 


V 


M 


F 


M 


F 


I 


19 


P 


S 


P 


P 


P 


P 


P 


S 


Q 


R 


R 


I 


K 


K 


K 


Q 


20 


A 


A 


A 


R 


R 


A 


R 


R 


L 


A 


A 


R 


R 


L 


R 


L 


21 


F 


F 


F 


F 


F 


F 


Y 


Y 


W 


F 


F 


Y 


Y 


Y 


Y 


W 


22 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


F 


A 


Y 


Y 


F 


N 


S 


F 


A 


23 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


F 


Y 


Y 


Y 


Y 


Y 


Y 


F 


24 


N 


S 


N 


D 


N 


N 


N 


N 


D 


D 


K 


N 


N 


N 


N 


D 


25 


Q 


K 


W 


S 


P 


S 


S 


G 


A 


T 


P 


A 


T 


Q 


G 


A 


26 


K 


G 


A 


A 


A 


H 


S 


T 


V 


R 


S 


K 


R 


E 


T 


V 


27 


K 


A 


A 


S 


S 


L 


s 


S 


K 


L 


A 


A 


T 


T 


S 


K 


28 


K 


N 


K 


N 


N 


H 


K 


M 


G 


K 


K 


G 


K 


K 


M 


G 


29 


Q 


K 


K 


K 


K 


K 


R 


A 


K 


T 


R 


F 


Q 


N 


A 


K 


30 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


31 


E 


Y 


Q 


N 


E 


Q 


E 


E 


V 


K 


V 


E 


E 


E 


E 


V 


32 


R 


P 


L 


K 


K 


K 


K 


T 


L 


A 


Q 


T 


P 


E 


T 


R 


33 


F 


F 


. F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


34 


D 


T 


H 


I 


I 


N 


I 


Q 


P 


Q 


R 


V 


K 


I 


L 


S 
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Table 13 , continued 
2 7 28 29 



30 31 



32 33 



34 35 




60 
61 



20 



10 



15 



n Gre en Mamba) 

Sfe^S^3^a*a> B toxin 
(DUFT85) ^^^T^tr^Sii^g^ , 

^ s V e a anemone) 5 II ^85)1^55^3 
"inactive" domaxn ^g^^lS 

HI-8 "active" ^^.-tfcll©!™* 

beTTbuniarotoxxn 
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Table 13, Continued 



33 
34 

10 35 



3 0 beta bungarotoxin B2 (DUFT85) *^£^ \P ^ Q : 7ft) 

31 Bovine spleen TI II (FIOR85) ^S€<? MO'. 1<Q 

32 Tachypleus trident atus (Horseshoe crab) hemocytef 
inhibitor (NAKA87) CSCQ lb MOUSO 



Bombyx mori (silkworm) SCI-III ( SAS A8 4 ) CS^O IP Nft*."70 
Bos taurus (inactive) BI-14 Q 1 1> *J ft » T| ^ 

Bos taurus (active) BI-8 fS^Q ID Nl O « t S» ^ 



Table 13, continued 



445 



37 


G 


G 


G 


G 


G 


38 


C 


C 


C 


C 


C 


39 


R 


R 


R 


R 


K 


40 


A 


A 


A 


A 


A 


41 


K 


K 


K 


K 


K 


42 


R 


S 


R 


R 


S 


43 


N 


N 


N 


N 


N 


44 


N 


N 


N 


N 


N 


45 


F 


F 


F 


F 


F 


46 


K 


K 


K 


K 


R 


47 


S 


S 


S 


S 


S 


48 


A 


A 


S 


A 


A 


49 


E 


E 


E 


E 


E 


50 


D 


D 


D 


D 


D 


51 


C 


C 


C 


C 


C 


52 


E 


M 


M 


M 


M 


53 


R 


R 


R 


R 


R 


54 


T 


T 


T 


T 


T 


55 


C 


C 


C 


C 


C 


56 


G 


G 


G 


G 


G 


57 


G 


G 


G 


G 


G 


58 


A 


A 


A 


A 


A 



59 
60 
61 



36: Engineered BPTI (KR15, ME52): Auerswald '88, Biol Chem 

Hoppe-Seyler , 369 Supplement, pp27- 3 5 . (JS Sfl \ ft N6 > 7 ^ ) 
37: Isoaprotinin G-l: Siekmann, Wenzel, Schroder, and 

Tschesche '88, Biol Chem Hoppe-Seyler, 369:157-163. tb Mt7*gg) 

38: Isoaprotinin 2: Siekmann, Wenzel, Schroder, and 

Tschesche ! 88, Biol Chem Hoppe-Seyler, 369:157-163. ^*601 10 KfO'ffl j 
39: Isoaprotinin G-2 : Siekmann, Wenzel, Schroder, and 

Tschesche '88, Biol Chem Hoppe-Seyler, 369 : 157-163 . lS£«L <D MO»92\ 
40: Isoaprotinin 1: Siekmann, Wenzel, Schroder, and 

Tschesche '88, Biol Chem Hoppe-Seyler, 3^:157-163.(5^^X^^02*?, 



Notes : 
a) 
b) 

c) 

d) 
e) 
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Table 13, continued 



ro residue 15 deleted. 
bo th beta bungarotoxin; > have c5 and cl4; 

B^JfiEi has . an Tl lnd S to residue 9 
^rr P -e- h ave C a, 3, X*. 30.3S.S. 

& 55 * v>=>^ F33 and G3-7 . 

all ^ologuas^avej^^ torm interchal n 

extra v_ » 
cystine bridges 



1 

2 

5 3 
4 
5 
6 
7 

10 8 
9 



12 

15 13 



• nodes for Tables 14 and 15 
Identification codes r 

BPTI „, . Kaise r, biochem. 16(8)1531-41 

synthetic B** 1 ' ? 
Semisynthetic BPTI, 
Semisynthetic BPTI, 
Semisynthetic BPTI, TS 
Semisynthetic BPTI 
Semisynthetic BPTI 

Engineered BPTI, ^ 511A 

9 BPTI Auerswald tel G 2Q8 511A 

10 BPTI ^ rS /^ I & f ro rMARK87 

H Engmeerea marK87 Hoppe-Seyler , 

Engineered BPTI ld , 88f Biol Chem H PP 

BPTI (KR15,ME52) . ^uei 

14 -i^^u- -x. — En9ineering 

3 ( ,)S91-59S (-0, _ ^ b1 , 88 , B iol Cne. 
« SS!Sxe,. G 3--|I;- nn . t .X .BB. Bio, C h e. 
16 T'^k -1 OB 3 30. 51 f 

U BpS Bnginee^ —aid »1 Biol chem 

20 Hrppe-stylir, "3.9:157-163. 511R 

21 engineered. = ^ £ QB 2 208 511R 

BPTI E Te"r t t; Dufton '85) 
Bovine Serum (FI0 RB5) 

bovine spleen Tl J-x (WAGN7 8) 

Sail »-\ (He "i/ e r (R inUl= HHV " Un 

Hemachatus hemachates (K 

D uf ton ■ 85) white (in Duf ton ' 85) 

-, a Naja nivea b toxin (in Dufton 

30 Bungarus ^f^Sin (in Button '85) 
,i vipera ammodytes i C REI87) . fc H TN90) 

31 Porcine ITI ^^l^PP protease inhibitor, (SHIN90) 
Human Alzheimer's beta AP P & charl s 

34 TsTaurus ^i-> ^ (ITI f^Uon '85) 
H ^onia sulcata (sea anemone) 5 



22 

30 23 
24 
25 
26 



40 32 
33 
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Identification codes for Tables 14 and 15 



37 Dendroaspis polylepis polylepes (Black Mamba) E toxin 
(in Duf ton 1 85) 

38 Vipera russelli (RusseHs viper) RW II (TAKA74) 
3 9 Tachypleus tridentatus (Horseshoe crab) hemocyte 

inhibitor (NAKA8 7 ) 

40 LACI 2 (Factor Xa) (WUNT88) 

41 Vipera ammodytes CTI toxin (in Duf ton '85) 

42 Dendroaspis polylepis polylepis (Black Mamba) venom K 

(in Duf ton 1 85) 

43 Homo sapiens HI-8e "inactive" domain (in Duf ton ' 85) 

44 Green Mamba toxin K, (in CREI87) 

45 Dendroaspis angusticeps (Eastern green mamba) C13 SI 
C3 toxin (in Duf ton f 85) 

46 -LAC I 3 

47 Equine ITI domain 2, (CREI87) 

48 LAC I 1 (Vila) 

49 Dendroaspis polylepis polylepes (Black mamba) B toxin 

(in Duf ton ■ 85) 

50 Porcine ITI domain 2, Creighton and Charles 

51 Homo sapiens HI-8t "active" domain (in Duf ton '85) 

52 Bos taurus (active) BI-8t 

53 Trypstatin Kito &al ('88) J Biol Chem 263(34) 18104-07 

54 Dendroaspis angusticeps (Eastern Green Mamba) C13 S2 
C3 toxin (in Duf ton '85) 

55 Green Mamba I venom Creighton & Charles '87 CSHSQB 
52 :511-519. 

56 beta bungarotoxin B2 (in Duf ton '85) 

57 Dendroaspis polylepis polylepis (Black mamba) venom I 

(in Duf ton ' 85) 

58 beta bungarotoxin Bl (in Duf ton ! 85) 
5 9 Bombyx mori (silkworm) SCI -III (SASA84) 
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Table 14: Tally of Ionise f^l^j^* ions 

Identifier^ E K_R_^__ - £ J 6 16 

1 2 5 1 6 4 0 1 1 6 " 

2 2 2 4 6 4 0 ! 1 5 15 

5 3 2 2 3 J 11 5 15 

4 2 2 3 6 4 0 x 5 1S 

5 2 2 , 6 4 0 1 1 5 " 

6 2 2 ' ! 4 0 1 1 5 15 

7 2 2 3 6 4 x 5 17 

8 2 I 3 5 4 0 1 1 4 ^ 



10 2 3 3 6 4 0 J x 6 16 

11 2 2 4 x 6 16 

12 2 ? 3 7 4 0 1 1 5 " 
15 13 2 3 3 7 4 x g is 

14 2 2 1 6 4 0 1 1 6 " 

15 2 2 1 I 1 0 1 1 6 

16 2 2 t 5 1 0 1 1 4 " 

17 2 2 3 5 4 ! i 3 15 

18 ? 3 3 5 4 0 1 1 3 H 

19 2 I I 4 0 1 1 5 ^ 

20 2 2 4 5 4 x 2 14 

21 2 4 3 4 4 0 1 1 ^ IB 

22 2 43 4 l2 i 6 

23 2 t 5 4 4 0 1 1 4 ^ 

24 2354 l4 io 

25 1 1 2 i 3 ! 1 1 2 14 

26 2 3 \ 8 3 0 1 1 8 22 

27 2468J 1 _ 1 13 

28 2 1 2 7 2 2 1 1 4 H 
29l42/ l5 13 

30 1 2 l 3 4 2 1 1 3 " 

31 4 1 3 2 4 1 1 1 ° " 

32 1 4 ? 2 3 0 1 1 " 2 \t 

35 33 2 ! 2 2 3 1 1 1 ~\ 12 

34 242^ !!!! 

35 2 2 3 2 4 0 i x 3 1? 

36 1 5 4 5 4 1 x 7 „ 

37 0 2 6 3 3 3 19 

38 \ I I 5 4 0 1 1 4 18 

39 3 3 5 a a 0 1 1 " 3 

40 3 7 4 x 5 17 

41 3 2 4 6 5 Q ! i 10 18 

42 . 1 2 8 5 4 0 x ^ ^ 

43 1 4 g 4 5 0 1 1 10 18 



44 
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Table 14: Tally of Ionizable groups 



t if ier 


D 


E 


K 


R 


Y 


H 


NH 


C02 


+ 


ions 


45 


0 


2 


8 


4 


5 


0 


1 


1 


10 


16 


46 


1 


3 


5 


5 


3 


0 


1 


1 


6 


16 


47 


3 


4 


4 


3 


3 


0 


1 


1 


0 


16 


48 


3 


6 


5 


4 


1 


1 


1 


1 


0 


20 


49 


0 


3 


3 


5 


5 


0 


1 


1 


5 


13 


50 


2 


6 


4 


2 


3 


0 


1 


1 


-2 


16 


51 


2 


4 


4 


3 


3 


0 


1 


1 


1 


15 


52 


1 


4 


6 


2 


3 


0 


1 


1 


3 


15 


53 


2 


2 


5 


1 


4 


0 


1 


1 


2 


12 


54 


2 


3 


6 


8 


3 


1 


1 


1 


9 


21 


55 


1 


3 


6 


7 


3 


1 


1 


1 


9 


19 


56 


6 


2 


6 


7 


4 


3 


1 


1 


5 


23 


57 


0 


3 


7 


7 


3 


1 


1 


1 


11 


19 


58 


6 


2 


5 


7 


4 


2 


1 


1 


4 


22 


59 


4 


7 


3 


1 


4 


0 


1 


1 


-7 


17 



45 . *- Fach Position 
Amino Acids at Eacn 
w» - m • Frequency of Amm 
Table 15. ^ej logue s 
in BPTI and 58 nom 



2 

2 
5 

10 

11 

13 

10 

11 

9 

1 



13 
7 

10 
12 
2 
9 

11 
2 
5 



Res. 
Id 
-5 
-4 
-3 
-2 
-1 
1 
2 
3 
4 
5 
6 
7 
8 
9 

9a 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 



Different 



_Contetits^ 



-58 


D 


-58 


E 


-55 


P T 


-43 


R3 


-41 


D4 


R35 


K6 


P35 


R6 


D32 


K8 


F34 


A6 


C59 





Z F 



D4 P3 R2 T2 
T4 A3 H2 
A4 V4 H3 — 
S4 A3 T3 R2 « P2 



E G H K L 
Q2GKNZE 

G2 L M N P 1 D 

E3 N F I L t v 
G L ¥ 



L25 
L28 
P46 
P30 



3 

12 
7 

14 
8 

10 
5 
5 
6 
2 
4 

13 
11 
8 
7 

10 
2 

10 
11 
1 

13 



-58 
Y24 
T31 
G58 
P45 
C57 
K22 
A41 
R19 
141 
124 
R39 
Y3 5 
F32 
Y52 
N47 
A2 9 
K31 
A3 2 
G32 
L22 
C58 
Q25 
T25 
F59 
V24 



E2 5 K2 F Q S T 
H3 D2 G2 E I K 
A9 I4 V 4 R3 ?3 

G 



L A Q 
L F Q H 



R F T A 



E K 



S 2 G2 I N T P 



Q8 p 7 R3 A3 Y2 K S D 
K 

R7 L4 12 N 

^ T mo 9 N I A F G 

R 12 L7 V6 Y3 
G9 F2 D2 K2 Q2 R 
L8 K7 F5 M4 Y4H2 A2 
M F4L2V2E A 
P 12 R8 K5 S4 Q2 
A8 s5 Q 

Yl8 A 5 H2 S N 



L N E T 



V3R2E2GHFQ 
£3. K5 T4 Q3 L2 I B 

K13 QH A5 F2 R 

E17 L5 V5 K2 S A R A v 

Pl l K4 Q4 R3 E3 G2 b 
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Table 15: Frequency of Amino Acids at Each Position 
in BPTI and 5 8 Homologues 



Res . 
Id . 


Different 
AAs 


-Contents 




First 


3 5 


2 


Y56 


W3 




Y 




3 


G50 


S8 R 




G 


7 7 


1 


G5 9 






G 


o o 


3 


C57 


A T 




C 




9 


R2 5 


m ^ Kfi 04 E3 M3 L2 D2 P 




R 


A n 

ft U 


2 


G35 


A 24 




A 


A 1 
ft X 


3 


N3 3 


K24 D2 - 




K 


ft Z 


12 


R2 2 


A1 0 GR S6 02 H2 N2 M D E K 


L 


R 


A "5 
ft JS 


2 


N57 






N 


A A 


3 


N4 0 


pi 4 




N 


a 

ft 3 


2 


F58 


-L 




F 


A ^ 
ft O 


11 


K3 9 


vq t?a co V2 D2 R H T A L 




K 


A *"7 
ft / 


2 


S36 


J. O 




s 


A P 
ft O 


11 


A2 3 


Til E6 06 L4 K2 T2 W2 S D 


R 


A 


A Q 
ft zf 


8 


E37 


KR Dfi 03 A2 P H T 




E 


50 


7 


E27 


D2 5 K2 L2 M Q Y 




D 


51 


2 


C58 


A 




C 


52 


9 


M17 


R15 E8 L7 K6 Q2 T2 H V 




M 


53 


11 


R3 7 


E6 Q5 K2 C2 H2 A N G D W 






54 


8 


T41 


Y5 A4 V3 12 E2 M K 






55 


1 


C59 






C 


56 


10 


G33 


V9 R5 14 E3 L A S T K , 




G 


57 


12 


G34 


V6 -5 A3 R2 12 P2 D K S L 


NG 


G 


58 


10 


A2 5 


-15 P7 K3 S2 Y2 G2 F D RA 




A 
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Table 16: Exposure in BPTI 



10 



Coordinates taken from 

Brookhaven Protein Data Bank entry 6PTI . 

HEADER PROTEINASE INHIBITOR (TRYPSIN) 13-MAY-87 
COMPND BOVINE PANCREATIC TRYPSIN INHIBITOR 

COMPND 2 (/BPTI $, CRYSTAL FORM /III$) 
AUTHOR A . WLODAWER 

Solvent radius = 1.40 
Atomic radii given in Table 7 



6PTI 



15 



Areas in A 2 



Residue 



Total 
area 



Not 

Covered 
by M/C 



fraction 



Not 

covered 
at all 



fraction 



ARG 


1 


342 . 


45 . 


205 . 


09 


0 . 


5989 


152 .49 


0 . 


4453 


PRO 


2 


239 . 


12 


92 . 


65 


0 . 


3875 


47 .56 


0 . 


1989 


ASP 


3 


272 . 


39 


158 . 


77 


0 . 


5829 


143 .23 


0 . 


5258 


PHE 


4 


311. 


33 


137 . 


82 


0 . 


4427 


43 .21 


0 . 


1388 


CYS 


5 


241 . 


06 


48 . 


36 


0 . 


2006 


0 .23 


0 . 


0010 


LEU 


6 


280 . 


98 


151 . 


45 


0 . 


5390 


115 .87 


0 . 


4124 


GLU 


7 


291 . 


39 


128 . 


91 


0 . 


4424 


90 .39 


0 . 


3102 


PRO 


9 


236 . 


12 


128 . 


71 


0 . 


5451 


99 . 98 


0 . 


4234 


PRO 


9 


236 . 


09 


109 . 


82 


0 . 


4652 


45 .80 


0 . 


1940 


TYR 


10 


330 . 


97 


153 . 


63 


0 . 


.4642 


79.49 


0 . 


2402 


THR 


11 


249 . 


20 


80 . 


, 10 


0 . 


.3214 


64 .99 


0 . 


.2608 


GLY 


12 


184 . 


.21 


56 . 


■ 75 


0 , 


.3081 


.23 .05 


0 . 


.1252 


PRO 


13 


240 . 


. 07 


130 . 


.25 


' 0 . 


.5426 


75.27 


0 . 


.3136 


CYS 


14 


237 


.10 


75 


. 55 


0 


.3186 


53 .52 


0 


.2257 


LYS 


15 


310 


.77 


200 


. 25 


0 


.6444 


192 .00 


0 


.6178 


ALA 


16 


209 


.41 


66 


. 63 


0 


.3182 


45 .59 


0 


.2177 


ARG 


17 


351 


.09 


243 


. 67 


0 


.6940 


201 .48 


0 


.5739 


ILE 


18 


277 


.10 


100 


. 51 


0 


.3627 


58 .95 


0 


.2127 


ILE 


19 


278 


.03 


146 


. 06 


0 


.5254 


96 .05 


0 


.3455 


ARG 


20 


339 


.11 


144 


.65 


0 


.4266 


43 .81 


0 


.1292 


TYR 


21 


333 


.60 


102 


.24 


0 


.3065 


69.67 


0 


.2089 


PHE 


22 


306 


.08 


70 


. 64 


0 


.2308 


23 .01 


0 


.0752 


TYR 


23 


338 


.66 


77 


.05 


0 


.2275 


17 .34 


0 


.0512 


ASN 


24 


264 


.88 


99 


.03 


0 


.3739 


38.69 


0 


.1461 


ALA 


25 


211 


.15 


85 


.13 


0 


.4032 


48 .20 


0 


.2283 


LYS 


26 


313 


.29 


216 


.14 


0 


.6899 


202 .84 


0 


.6474 



454 

Table 16, continued. 



ALA 27 


210 . 


66 


96 .05 


0 .4560 


54 .78 


0 .2601 


GLY 28 


186 . 


83 


71 .52 


0 .3828 


32 .09 


0 . 1718 


LEU 2 9 


280 . 


70 


132 .42 


0 .4718 


93 .61 


0 .3335 


CYS 3 0 


238 . 


15 


57 .27 


0 .2405 


19 .33 


0 .0812 


GLN 31 


301 . 


15 


141.80 


0 .4709 


82 . 64 


0 .2744 


THR 32 


251 . 


26 


138 .17 


0 .5499 


76 .47 


0 .3043 


PHE 33 


304 . 


27 


59.79 


0 .1965 


18 . 91 


0 .0622 


VAL 34 


251. 


56 


109 .78 


0 .4364 


42.36 


0 .1684 


TYR 3 5 


332 . 


64 


80 .52 


0 .2421 


15.05 


0 .0452 


GLY 3 6 


187 . 


06 


11.90 


0 .0636 


1. 97 


0 .0105 


GLY 37 


185 . 


.28 


84 .26 


0 .4548 


39 . 17 


0 .2114 


CYS 3 8 


234 . 


,56 


73 .64 


0 .3139 


26 .40 


0 .1125 


ARG 3 9 


417 . 


.13 


304 . 62 


0 .7303 


250 .73 


0 .6011 


ALA 40 


209 - 


.53 


94 .01 


0 .4487 


52 .95 


0 .2527 


LYS 41 


314 . 


.60 


166 .23 


0 .5284 


108.77 


0 .3457 


ARG 42 


349 , 


.06 


232 . 83 


0 .6670 


179.59 


0 .5145 


ASN 43 


266 


.47 


38 . 53 


0 . 1446 


5.32 


0 . 0200 


ASN 44 


269 


.65 


91 . 08 


0 .3378 


23 .39 


0 . 0867 


PHE 45 


313 


.22 


69.73 


0 .2226 


14 . 79 


0 . 0472 


LYS 4 6 


309 


.83 


217 . 18 


0 .7010 


155.73 


0 .5026 


SER 47 


224 


.78 


69.11 


0 .3075 


24.80 


0 .1103 


ALA 48 


211 


. 01 


82 .06 


0 .3889 


31.07 


0 .1473 


GLU 4 9 


286 


.62 


161.00 


0 .5617 


100 .01 


0 .3489 


ASP 5 0 


299 


.53 


156 .42 


0 . 5222 


95 .96 




CYS 51 


238 


.68 


24 .51 


0 .1027 


0 .00 


0 . 0000 


MET 52 


293 


.05 


89.48 


0 .3054 


66 .70 


0 .2276 


ARG 53 


356 


.20 


224 .61 


0 .6306 


189 .75 


0.5327 


THR 54 


251 


.53 


116 .43 


0 .4629 


51 .64 


0.2053 


CYS 5 5 


240 


.40 


69 .95 


0 .2910 


0.00 


0 . 0000 


GLY 56 


184 


.66 


60 .79 


0 .3292 


32 .78 


0 . 1775 


GLY 57 


106 


.58 


49 .71 


0 .4664 


38 .28 


0 .3592 


ALA 58 


no 


position given 


in Protein Data Bank 




" Total 


area" 


is 


the area 


measured by 


a rolling 


sphere c 



radius 1.4 A, where only the atoms within the 
residue are considered. This takes account of 
conformation . 

"Not covered is the area measured by a rolling sphere by M/C" 

of radius 1.4 A where all main-chain atoms are 
considered, fraction is the exposed area divided 
by the total area. Surface buried by main- chain 
atoms is more definitely covered than is surface 
covered by side group atoms. 

"Not covered is the area measured by a rolling sphere at all" 

of radius 1.4 A where all atoms of the protein 
are considered. 



Table 17 



Plasmids used in Detailed Example I 



SfStn fitf and col.1 of PBR322 doned into 
P Aat II/Rcc I sites 

pLG3 5^2 with ftss I Bite --oved^ 

^ ' ?nfo IX/S ? "ites.^X/ASH XI 3i«s 

„?tn second part of c^pM 9«» ^ 

Into Avr II/ASH 11 ; s |Sfd gene cloned 

p L G5 th,rd par - 2^ f ite creaCed 

P^ ^iSrt o £ c^ 9 ene cloned 
l iU^led^ — . same len 9 th 

pLGlO P + ^* gene _ am£ R gene 
pLGH ^ 



pLG4 
pLG5 



Aha 1 1 

Fspl 

EcoRI 

Smal 

Hind i I I 

Hind ll 

5 

Aatll 

BbvII 

BstB I 

Eco57I 

Esp l 

Nhel 

PflMI 

Rsr I 

S£el 

Xcal 
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Table 18: Enzyme sites eliminated when 
M13mpl8 is cut by Ava il 
and Bsu 36I 

Narl Gdill Pyul 

Bgll HgiEII Bsu36 I 

SacI Kpn l Xma l 

BamHI Xba l Sai l 

AccI PstI Sph I 



Table 19: Enzymes not cutting 
M13mpl8 



Af I I 

Bel l 

BstEII 

EcoNI 

Hpa l 

Not I 

PmaCI 

Sac I 

StuI 



Apal 

BspM I 

BstXI 

EcoO109I 

Mlul 

Nrul 

Ppal 

Sea l 

Sty l 



Avrll 

BssHI 

EagI 

EcoRV 

Ncol 

Nsi l 

PpuMI 

Sfil 

Tthllll 



Xhol 



Table 20: Enzymes cutting 
Amp R gene and ori 

Aatll BbvII Eco57I . Ppa l 

Sea l Tthlll l Aha I I Gdill 

Pvul Fsp l Bgll HgiEII 

Hindi! PstI Xbal AflHI 



Ndel 
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Table 21: Enzymes tested on Ambig DNA 



Enzyme Recognition Symm 

%AccI GTMKAC P 

Af I II CTTAAG P 

Apa l GGGCCC P 

Asu II TTCGAA P 

Aval I I ATGCAT P 

Ayr 1 1 CCTAGG P 

BamHI GGATCC P 

Bel l TGATCA P 

BspM I I TCCGGA P 

BssHII GCGCGC P 

+ BstE I I GGTNACC P 

% BstX I CCANNNNN P 

+ Dra ll RGGNCCY P 

+ ECON I CCTNNNNN P 

EcoRI GAATTC P 

EcoRV GATATC P 

+ Esp I GCTNAGC P 

Hindi II AAGCTT P 

Hpa l GTTAAC P 

Kpn l GGTACC P 

Mlul ACGCGT P 

Narl GGCGCC P 

Ncol CCATGG P 

Nhel GCTAGC P 

Not I GCGGCCGC P 

Nrul TCGCGA P 

+ Pf 1M I CCANNNNN P 

PmaCI CACGTG P 

+ PpuM I RGGWCCY P 

+ Rsr II CGGWCCG P 

SacI GAGCTC P 

Sai l GTCGAC P 

+SauI CCTNAGG P 

+ Sf i l GGCCNNNNNGGCC P 

Sma l CCCGGG P 

Spe l ACTAGT P 

Sph I GCATGC P 

Stu I AGGCCT P 

% Sty I CCVJWGG P 



cuts Supply 



2 Sc 


4 


<B,M, I,N,P,.T 


1 Sc 


5 


<N 


5 & 


1 


<M, I ,N,P,T 


2 & 


4 


<P,N(BstBI) 


5 Sc 


1 


<T; Nsil :M,N, P 7 T; 




ECOT22I :T 


1 Sc 


5 


<N 


1 Sc 


5 


<S,B,M,I,N,P,T 


1 Sc 


5 


<S,B,M, I,N, T 


1 Sc 


5 


<N 


1 Sc 


5 


<N, T 


1 Sc 


6 


<S,B,M,N,T 


8 Sc 


4 


<N,P,T 


2 Sc 


5 


<M / T ; EcoO109I:N 


5 Sc 


6 


<N (soon) 


1 Sc 


5 


<S / B,M, I,N,P,T 


3 Sc 


3 


<S,B,M, I,N,P,T 


2 Sc 


5 


<T 


1 Sc 


5 


<S,B # M f I,N,P,T 


3 Sc 


3 


<S,B,M f I / N / P ,T 


5 Sc 


1 


<S,B # M,I,N,P,T ; 




Asp718 :M 


1 Sc 


5 


<M,N,P,T 


2 Sc 


4 


<B,N / T 


1 Sc 


5 


<B # M / N / P,T 


1 Sc 


5 


<M,N,P, T 


2 Sc 


6 


<M,N,P,T 


3 Sc 


3 


<B / M,N,T 


7 Sc 


4 


<N 


3 Sc 


3 


<none 


2 Sc 


5 


<N 


2 Sc 


5 


<N,T 


5 Sc 


1 


<B(SstI) ,M, I,N,P| 


. 1 Sc 


5 


<B # M,I,N f P,T 



2 & 5 <M ; Cvn l : B ; Mst I I 

:T; Bsu36 I :N; Aoc I ;T 

8 & 5 <N.P.T C^fe^ tbiJO: If!) 

3 & 3 <B,M,I,N,P,T 
1 & 5 <M,N,T 

5 & 1 <B,M,I,N,P,T 

3 & 3 <M,N, I (AatI) ,P,T 

1 & 5 <N,P,T 
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TABLE 21, continued. 



Xcal 
Xhol 



Xmal 
Xmalll 



GTATAC 
CTCGAG 

CCCGGG 
CGGCCG 



P 3 Sc 3 <N(soon) 

P 1 Sc 5<B,M,I,P,T; Ccrl 

T ; PaeR7I:N 
P 1 & 5 <I,N,P,T 

P 1 Sc Eco52I:T 



N restrct = 



43 
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Table 22: ipbd gene (.SE<fr 111 KJO 1 



pbd modlO 29III88 : 

lacUVS Rsr ll/ Avr ll/gene/ TrpA attenuator/ Mst ll; ! 
5 -5'- CGGaCCG TaT ! RsrII site 

CCAGGC tttaca CTTTATGCTTCCGGCTCG tataat GTG ! lacUVS 





aATTGTGAGCGGATAACAATT 








lacO ODerator 


CCT 


AGGAgg CtcaCT 












Shine -Dal garno 


auy 


aag 


aaa 


tct 


ctg 


gtt 


ctt 


aag 


get 


age ! 


10 M13 leader 


v-j L- 


get 


gtc 


gcg 


ace 


ctg 


gta 


ccg 


atg 


ctg ! 


20 


tct 


ttt 


get 


cgt 


ccg 


gat 


ttc 


tgt 


etc 


gag ! 


30 


ccg 


cca 


tat 


act 


ggg 


ccc 


tgc 


aaa 


gcg 


cgc ! 


40 


ate 


ate 


cgt 


tat 


ttc 


tac 


aac 


get 


aaa 


gca ! 


50 


ggc 


ctg 


tgc 


cag 


ace 


ttt 


gta 


tac 


ggt 


ggt 


60 


tgc 


cgt 


get 


aag 


cgt 


aac 


aac 


ttt 


aaa 


teg 


. 70 


gec 


gaa 


gat 


tgc 


atg 


cgt 


ace 


tgc 


ggt 


ggc 


. 80 


gec 


get 


gaa 


ggt 


gat 


gat 


ccg 


gec 


aaa 


gcg 


! 90 


gee 


ttt 


aac 


tct 


ctg 


caa 


get 


tct 


get 


acc 


! 100 


gaa 


tat 


ate 


ggt 


tac 


gcg 


tgg 


gee 


atg 


gtg 


! 110 


gtg 


gtt 


ate 


gtt 


ggt 


get 


acc 


ate 


ggt 


ate 


! 120 


aaa 


ctg 


ttt 


aag 


aaa 


ttt 


act 


teg 


aaa 


gcg 


! 130 


tct 


taa 


tag 


tga 


ggttacc 


1 


BstEII 







agtcta agcccgc ctaatga geggget tttttttt ! terminator 
CCTgAGG -3 ' ! Mstll 



Table 23: ipbd DNA sequence I b KiO IS^j) 

DNA Sequence file = UV5_M13PTIM13 .DNA; 17 



DNA Sequence title = 
pbd modlO 29III88 : lac-UV5 RsrII/Avrll/gene/TrpA 

attenuator/Mstll ; ! 



1 


c| 


GGAl 


CCG| 


TAT | 


CCA| 


GGC| 


TTT | 


ACA| 


CTT| 


TAT| 


GCT| 


TCC| 


GGC| 


TCG 


41 TATlAAT 


GTGl 


TGG 


AAT | 


tgt| 


GAG | 


CGG| 


ata| 


aca| 


att| 


CCT| 


AGG | 


AGG 


83 CTC ACT ATG AAG 


AAA 


TCT | 


CTG 


gtt| 


CTT 


AAG 


GCT | 


AGC| 


GTT 


GCT 


125 


TC 


GCGj 


ACC| 


CTG| 


GTA 


CCG 


ATG' 


CTG 


TCT 


TTT 


GCT 1 


CGT | 


CCG| 


GAT 


167 


TC> 


TGT 


CTC] 


GAG 


CCG 


CCA 


TAT 


ACT' 


GGG 


CCC 


TGC 


AAA 


GCG 


CGC 


209 


TC 


ATC 


CGT 


TAT 


TTC 


TAC 


AAC 


GCT 


AAA 


GCA 


GGC 


CTG 


TGC 


CAG 


251 


cc 


TTT 


GTA 


TAC 


GGT 


GGT 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


293 


AA 


TCG 


GCC 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 


GGC 


GCC 


GCT 


335 


AA 


GGT 


GAT 


GAT 


| CCG 


GCC 


| AAA 


GCG 


|GCC 


TTT 


| AAC 


|TCT 


CTG 


j CAA 


377 


CT 


|TCT 


GCT 


j ACC 


| GAA 


| TAT 


| ATC 


|ggt 


|TAC 


|GCG 


| TGG 


|GCC 


| ATG 


GTG 


419 


TG 


GTT 


| ATC 


| GTT 


|ggt 


|GCT 


j ACC 


1 ATC 


j GGT 


| ATC 


j AAA 


j CTG 


| TTT | AAG 


461 


AA 


j TTT 


1 ACT 


TCG 


AAA 


GCG 


| TCT 


| TAA 


| TAG 


| TGA 


| GGT 


|TAC 


|CAG 


|TCT 


503 


AG 


|CCC 


| GCC | TAA | TGA | GCG | GGC | TTT 


| TTT 


j TTT 


| CCT 


| GAG 


1 G 





Total =53 9 bases 
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Table 24 : Summary of Restriction Cuts 



Enz = %Acc I has 1 observed sites : 259 

Enz = Acc III has 1 observed sites : 162 

Enz = Acy I has 1 observed sites : 328 

Enz = Afl II has 1 observed sites : 109 

Enz = % Af 1 III has 1 observed sites : 404 

Enz = Aha III has 1 observed sites : 292 

Enz = Apa I has 1 observed sites : 193 

Enz = Asp718 has 1 observed sites : 13 8 

Enz = Asu II has 1 observed sites : 471 

Enz = % Ava I has 1 observed sites : 175 

Enz = Avr II has 1 observed sites : 76 

Enz = % Ban I has 3 observed sites : 138 328 540 

Enz = Bbe I has 1 observed sites : 328 

Enz = + Bgl I has 1 observed sites : 3 52 

Enz = +Bin I has 1 observed sites : 346 

Enz = % BspM I has 1 observed sites : 319 

Enz = BssH II has 1 observed sites : 205 

Enz = + BstE II has 1 observed sites : 493 

Enz = % BstX I has 1 observed sites : 413 

Enz = Cf r I has 2 observed sites : 299 350 

Enz = + Dra II has 1 observed sites : 193 

Enz = + Esp I has 1 observed sites : 277 

Enz = %Fok I has 1 observed sites : 213 

Enz = Gdi II has 2 observed sites : 299 350 

Enz = Hae I has 1 observed sites : 240 

Enz = Hae II has 1 observed sites : 328 

Enz = + Hga I has 1 observed sites : 47 8 

Enz = % HgiC I has 3 observed sites : 138 328 540 

Enz = % HgiJ II has 1 observed sites : 193 

Enz = Hind III has 1 observed sites : 377 

Enz = + Hph I has 1 observed sites : 340 

Enz = Kpn I has 1 observed sites : 13 8 

Enz - + Mbo II has 2 observed sites : 93 304 

Enz = Mlu I has 1 observed sites : 404 

Enz = Nar I has 1 observed sites : 328 

Enz = Nco I has 1 observed sites : 413 

Enz = Nhe I has 1 observed sites : 115 

Enz = Nru I has 1 observed sites : 128 

Enz = Nsp (7524) has 1 observed sites : 311 

Enz = NspB II has 1 observed sites : 332 

Enz = + Pf 1M I has 1 observed sites : 184 

Enz = +Pss I has 1 observed sites : 193 

Enz = +Rsr II has 1 observed sites : 

Enz = +Sau I has 1 observed sites : 535 
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Table 24: Summary of Restriction Cuts 



%SfaN I has 2 observed sites : 144 209 
+ Sf i I has 1 observed sites : 3 51 
Sph I has 1 observed sites : 311 
Stu I has 1 observed sites : 240 
% Sty I has 2 observed sites : 76 413 
Xca I has 1 observed sites : 259 
Xho I has 1 observed sites : 175 
Xma III has 1 observed sites : 299 



Enzymes that do not cut 



Aat II 


AlwN I 


ApaL I 


Ase I 


Ava III 


Bal I 


BamH I 


Bbv I 


Bbv II 


Bel I 


Bgl II 


Bsm I 


BspH I 


Cla I 


Dra III 


Eco47 III 


EcoN I 


EcoR I 


EcoR V 


HgiA I 


Hinc II 


Hpa I 


Mst I 


Nae I 


■ Nde I 


Not I 


Pie I 


PmaC I 


PpuM I 


Pst I 


Pvu I 


Pvu II 


Sac I 


Sac II 


Sal I 


Sea I 


Sma I 


SnaB I 


Spe I 


Ssp I 


Tag II 


Tthlll I 


Tthlll II 


Xho II 


Xma I 


Xmn I 











Enz = 
Enz = 
5 Enz = 
Enz = 
Enz = 
Enz = 
Enz = 
10 Enz = 
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Table 25: Annotated Sequence of ipbd gene 



5 ' - C | GGA | CCG | TAT | CCA | GGC | TTT | ACA | CTT | TAT | 
I Rsr III I -35 1 



28 



10 



| GCT | TCC | GGC | TCG | TAT | AAT | GTG | TGG | 
52 

I -10 I 



15 



| AAT | TGT | GAG | CGG | ATA | ACA | ATT | 
| lac operator | 

| CCT | AGG | AGG | CTC | ACT | 
| Avr II | 

I S. D. I 



73 



88 



20 



25 



30 



m | k | k | s | 1 | v 
1 I 2 | 3 | 4 | 5 | 6 
ATG AAG AAA TCT I CTG I GTT 



|v|a|v|a|t|l 
I 11 j 12 | 13 | 14 | 15 | 16 
| GTT | GCT | GTC | GCG | ACC j CTG 
| Nru I | 1 



1 | k | a | s | 
7 j 8 | 9 | 10 | 
CTT | AAG | GCT j AGC j 
Afl Il| Nhe I 



v | P | m | 1 | 
17 | 18 j 19 | 20 | 

GTA | CCG | ATG | CTG j 

Kpn I | 



|s|f|a|r|p|d|f|c|l|e| 
| 21 1 22 | 23 j 24 j 25 | 26 | 27 j 28 j 29 1 30 j 
| TCT | TTT | GCT | CGT | CCG j GAT j TTC j TGT | CTC j GAG j 

| AccIII | I Ava I | 



Xho I 



118 



148 



178 



35 



40 



P I P |y I t | g | p| c | k| a | r | 

31 1 32 j 33 | 34| 35 j 36 j 37 j 38 j 39 1 40 | 
CCG I CCA | TAT | ACT j GGG j CCC j TGC j AAA j GCG j CGC j 
| PflM I | iBssH II 





i i 


Dra 




Pss 


i I 



208 
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Table 25, continued 



| i|i|r|y|f|y|n|a|k| 
I 4l| 42 | 4 3 j 44 j 45 j 46 j 47 | 48 | 49 1 
I ATC I ATC CGT TAT TTC TAC I AAC | GCT | AAA j 



235 



10 



a|g|l|c|q|t|f|v|y|g|g| 
50 j 51 j 52 | 53 | 54 j 55 | 56 j 57 j 58 | 59 | 60 | 
| GCA | GGC | CTG | TGC | CAG | ACC j TTT | GTA j TAC | GGT j GGT 
I Stu I I Acc I | 



268 



Xca I 



| c | r | a | k | r | n | n | f | k | 
15 j 61 1 62 j 63 j 64 j 65 1 66 j 67 j 68 | 69 | 
j TGC | CGT j GCT | AAG j CGT j AAC j AAC j TTT | AAA j 
I Esp I | 



295 



20 



25 



s|a|e|d|c|m|r|t|c|g| 
70 1 7lj 72 j 73 j 74 | 75 j 76 j 77 j 78 | 79 1 
| TCG | GCC j GAA | GAT j TGC j ATG | CGT | ACC j TGC j GGT j 
lXmalll | I Sph I 1 



325 



g | a 
80 1 81 
| GGC | GCC 
I Bbe I 



Nar I 



a | e | g | d | d | 
82 j 83 j 84 j 85 | 86 | 
GCT I GAA GGT I GAT GAT 



346 



30 



p | a | k | a | a | 
87 j 88 | 89 1 90 | 91 1 
j CCG | GCC | AAA | GCG j GCC j 
I Sfi I I 



361 



35 



|f|n|s|l|q|a|s|a|t| 
| 92 j 93 | 94 | 95 j 96 | 97 j 98 j 99|l00 
| TTT | AAC | TCT | CTG j CAA | GCT | TCT j GCT j ACC 

| Hind 3 | 



388 



40 



i e | y | i | g | y | a | w | 

| 101 j 102 j 103 | 104 | 105 | 106 | 107 | 
| GAA | TAT | ATC | GGT | TAC | GCG j TGG j 

| Mlu I | 



409 
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Table 25, continued 

| a | m | v | v | v | 
5 jl08|l09jll0|lll|ll2| 

| GCC j ATG | GTG | GTG | GTT j 424 

| BstX I [ 

| Nco I | 

10 |i|v|g|a|t|i|g|i| 

j 113 j 114 j 115 j 116 j 117 j 118 | 119 | 12 0 | 

j ATC | GTT | GGT | GCT | ACC | ATC j GGT | ATC j 44 8 

|k|l|f|k|k|f|t|s|k|a| 
15 j 121 j 122 j 123 j 124 j 125 j 126 j 127 j 12 8 | 12 9 | 130 | 

| AAA | CTG | TTT | AAG j AAA j TTT | ACT j TCG j AAA j GCG j 478 

lAsu III 

| s | . | - | . | 

20 |131|132|133|134| 

| TCT j TAA | TAG | TGA j GGT | TAC | CAG | TCT | ... 502 

| BstE II| 

| AAG | CCC | GCC | TAA | TGA| GCG | GGC | TTT | TTT | TTT | 532 
25 1 Trp terminator 1 

|CCT|GAG|G -3 1 539 
Sau I I 



30 Note the following enzyme equivalences, 

Xma III = Eag I 
Acc III = BspM II 
Dra II = EcoO109 I 
35 Asu II = BstB I 
Sau I = Bsu3 6 I 
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i 1 ttt I ACAl CTT I TAT \ 

• _s2acer__J__££i_-± L 

i rprr I CGC \ TCG I TAT \ AAT \ GTG \ TGG | 
| GCT | TCC I GGC I l^_iO___l 

10 

lac^oEerator^ 



15 



|CCT|AGGl 
J__Avr_IlI 



20 



25 



12811291130 
lg cc|gctlccTlTCG|AAA|GCGl 
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Table 27: DNA_synthl LS€CS ID »40 * tg^O 

5 

5 ■ | CCG | TCC | GTC [ GGA ] CCG | TAT | CCA | GGC | TTT | ACA [ CTT | TAT | 



1 GCT [ TCC 1 GGC | TCG | TAT | AAT [ GTG | TGG [ 

10 

| AAT 1 TGT 1 GAG [ CGG | ATA | ACA \ ATT | 

15 

| CCT | AGG [ 
gga tec 

/ 3' = olig#3 CS6Q l b Mg\|C|J 

2 0 1 GCC | GCT | CCT | TCG | A AA | GCG | * 

egg c 9 a gg a a gc ttt cgc 



| TCT I TAA j TAG | TGA | GGT | TAC | CAG | TCT | 
2 5 aga att ate act cca atg gtc aga 



| AAG | CCC | GCC | TAA | TGA | GCG | GGC | TTT | TTT | TTT | 
ttc ggg egg att act cgc ccg aaa aaa aaa 



| CCT | GAG | GCA | GGT | GAG | CG 

gga etc cgt cca etc gc - 5 ' ^.SfeC^ \}> MtK 



"Top" strand 99 

"Bottom" strand 100 

4 0 Overlap 2 3 (14 c/g and 9 a/t) 

Net length 158 



842776 .1 
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Table 



5 ' - |gca| cca | acg 
| spacer 



.e 28: DNA seq2 (S€Q, tO hj Q^A^ f) 
Prof*** s< j,, gt , >r . 



| CCT.| AGG | AGG | CTC | ACT | 
1 Avr Il| 

I S. D. I 



10 



15 



20 



25 



30 



1 | k 
7 | 8 
CTT | AAG 
Afl II 



ra | k | k | s | 1 | v 
|1|2|3|4|5|6 
j ATG j AAG | AAA | TCT j CTG | GTT 



v | a | v | a | t | 1 
| 11 j 12| 13 j 14| 15 | 16 
| GTT | GCT | GTC j GCG j ACC j CTG 
I Nru I I 



b | f | a| r |p | d | f | c 

21 j 22| 23 j 24 j 25 j 26 | 27 | 28 
TCT j TTT | GCT | CGT j CCG j GAT j TTC j TGT 
lAccIIll 




P I P I y I t | g | p | c | k 

31 1 32 j 33 j 34| 35 j 36 | 37 | 38 
CCG | CCA | TAT | ACT j GGG | CCC | TGC | AAA 
I PflM I I 



a | s | 
9 | 10 | 
GCT I AGC | 
Nhe I | 



| Dra 




| Pss 


i I 



1 | e | 
29 | 30 | 
CTC | GAG | 
Ava I 



Xho I 



a | r | 
39 | 40 | 
GCG | CGC j 
BssH III 



35 



40 



i | i I r | 
41 | 42 | 43 | 
ate j ate j cgt j 



| t | s | k | 
| 127 | 128 | 129 | 
j ACT j TCG | AAa j gcg | get | gcg | 
| Asu II 1 spacer 



10 



20 



25 



35 
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Table 29: DNA_synth2 I b KiQMS'O. 



5 1 - I GCA I GCA I ACG 



CCT I AGG I AGG CTC ACT 



1 ATG 1 AAG 1 AAA | TCT | CTG 1 GTT | CTT | AAG 1 GCT | AGC | 



| GTT | GCT 1 GTC | GCG ] ACC 1 CTG 1 GTA | CCG [ ATG | CTG [ 
15 oligtte^ 3'- ggc tac gac 

/ 3' = olig#5 Cjgg lb M0> l<»2j 

I TCT I TTT I GCT I CGT I CCG I GAT I TTC I TGT | CTC | GAG | 



aga aaa cga gca ggc eta aag aca gag etc 
I CCG I CCA I TAT | ACT | GGG | CCC | TGC | AAA j GCG | CGC | 

gg° ggt ata t g a ccc ggg ac g ttt: c ^ c g c g 



I ATC j ATC I CGT | 
tag tag gca 



3 0 I ACT j TCG I AAA | GCG | GCT | GCG | 

tga age ttt cgc cga cgc - 5 ' 



"Top" strand 99 

"Bottom" strand 99 

Overlap 24 (14 c/g and 10 a/t) 

Net length 155 
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Table 30: DNA_seq3 LS&Sl \b MO'. U3) 



5 1 - 



, i | i | r | y | f 
| 41 | 42 I 43 | 44 | 45 
| ATC | ATC | CGT | TAT j TTC 

| a | g | 1 | c | q 
| 50 j 51 | 52 j 53 | 54 
| GCA | GGC | CTG j TGC | CAG 
I Stu I I 



ccc | tgc | aca 
spacer 



c I 
61 1 



r 
62 



a 
63 



64 I 



r 
65 



Y 1 


n | 


a | k | 


46 | 


47| 


48 | 49 | 


TAC| 


AAC | 


GCT | AAA j 


t | 


f 1 


v | y | 


55 


56 


57 j 58 | 


ACC 


TTT 


GTA | TAC | 






Acc I | 






Xca I | 




1 n 


1 f 1 k | 


66 


1 67 


j 68 | 69 | 


AAC 


| AAC 


| TTT | AAA | 



a | r | 
39 | 40 | 
GCG j CGC | 
BssH II 



1 Esp I L 

s | a | e | d | c | m | r | t | c 

| 70 | 71 | 72 j 73 j 74 j 75 j 76 j 77 | 78 
| TCG | GCC | GAA | GAT | TGC j ATG | CGT | ACC j TGC 
|XmaIIl| I Sph l| 



9 I 9 I 
59 j 60 | 
GGT GGT 



g I 

79 | 
GGT 



1 g 1 a | 




| 80 | 81 




| GGC | GCC 


get |gaa 


| Bbe I 


spacer 


| Nar I | 



I t | s | k | 
j 127 j 128 | 129 | 
| ttt j acT | TCG | AAa j gcg | teg | ccg 
I Asu II I 
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Table 31: DNA 



synth3 i» MQtt&Sl 



5 1 - 1 CCC 1 TGC | ACA | GCG | CGC 1 



1 ATC 1 ATC | CGT | TAT 1 TTC [ TAC | AAC 1 GCT 1 AAA 1 

| GCA 1 GGC | CTG | TGC 1 CAG | ACC ) TTT 1 GTA [ TAC | GGT | GGT 1 



1 TGC | CGT 1 GCT | AAG 1 CGT [ AAC [ A AC | TTT | AAA | 
acg gca cga ttc gca ttg ttg aaa ttt 

| TCG | GCC | GAA| GAT | TGC | ATG | CGT | ACC | TGC | GGT | 
age egg ctt eta acg tac gca tgg acg cca 

| GGC | GCC | GCT | GAA | 
ccg egg cgt ctt 




| TTT | ACT | TCG | AAA | GCG | TCG | CCG | 
aaa tga age ttt cgc age ggc -5' 



"Top" strand 
"Bottom" strand 
Overlap 
Net length 



25 (15 g/c & 10 a/t) 
146 



93 
97 
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Table 32: DNA_seq4 jft g ffi 'h fsff?'-\9^ 

|g|a|a|e|g|d|d| 
■5' j 80 j 81 j 82 J 83 j 84 j 85 | 86 1 

| cc 1 1 cgc | cct | GGC | GCC | GCT j. GAA | GGT j GAT j GAT j 
1 spacer 1 Bbe I | 
I Nar I 1 

| p | a | k | a | a | 
| 87 1 88 j 89 J 90 | 91 1 
| CCG | GCC j AAA | GCG | GCC j 
1 Sfi I L 

|f|n|s|l|q|a|s|a|t| 
j 92 | 93 j 94 | 95 j 96 | 97 j 98 j 99|l00| 
| TTT | AAC | TCT | CTG | CAA | GCT j TCT j GCT j ACC j 

| Hind 3 | 

I e | y | i | g | y | a | w | 
j 101 j 102 j 103 j 104 | 105 | 106 | 107 | 
| GAA | TAT | ATC | GGT | TAC j GCG j TGG j 

| Mlu II 

| a | m | v | v | v | 
j 108 | 109 | 110 | 111 | 112 | 
| GCC | ATG | GTG | GTG | GTT j 

1 BstX I [ 

1 Nco I 1 



|i|v|g|a|t|i|g|i| 
j 113 j 114 j 115 j 116 j 117 | 118 | 119 | 120 | 
| ATC | GTT | GGT | GCT j ACC | ATC | GGT | ATC j 

|k|l|f|k|k|f|t|s|k| 
j 121 j 122 j 123 j 124 j 125 | 126 | 127 | 128 | 129 | 

| AAA | CTG | TTT j AAG j AAA | TTT | ACT j TCG j AAa j gcg | teg | ggc | - 3' 

| Asu II | spacer [_ 
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Table 33: DNA_synth4 CS6Q >&K1Q: I5*t) 

5 5 ' 1 GCT 1 CGC | CCT | GGC | GCC | GCT | GAA | GGT | GAT | GAT | 
| CCG | GCC | AAA | GCG | GCC [ 

| TTT 1 AAC 1 TCT | CTG | CAA [ GCT [ TCT [ GCT | ACC | 



1 GAA | TAT | ATC | GGT [ TAC | GCG | TGG 1 
15 olig#10 = 3 ' - ata tag cca atg cgc acc 

/ 3' = olig#9 UEQ \t> VI Ot H.O 

1 GCC 1 ATG | G TG | GTG | GTT | 
20 egg tac cac cac caa 



| ATC | GTT | GGT | GCT | ACC | ATC | GGT | ATC | 
tag caa cca cga tgg tag cca tag 

25 

| AAA | CTG | TTT | AAG | AAA | TTT | ACT | TCG | AAA | GCG | TCT | TGA | 
ttt gac aaa ttc ttt aaa tga age ttt cgc aga act - 5 1 

30 

"Top" strand 100 
"Bottom" strand 93 

Overlap 25 (14 c/g and 11 a/t) 

Net length 14 9 

35 



Table 34: Somfi interaction sets i n BPTI 



15 



20 



25 



30 



35 



Contents 
D -32 
E -32 
T p F 



BPTI 



2 3 4 5 



-29 



Z3 R3 Q2 T2 H 
m T2 P2 Q2 E 
R21 A2 K2 H2 



45 



G L K E -18 

G N K R - 18 
p L I T G D 
R21 ol £ H2 N E V F L 
S°5 « « S « B , 0 ^ * 
? 19 D4 L3 Y2 12 A2 S 

033 v , no 12 Y2 D2 T R 

L ll E5 N4 K3 Q2 H 

L 18 EH K2 S Q 

P26 H2 A2 I _ |, y F 
pl7 A6 V3 R2 Q L K ^2 S I D 
YH E7 D4 A2 N2 R2 V 
T17 P5 A3 R2 I S Q * 



R 



N I 



. 2 V G A I N F 



G32 K 
P22 R6 L3 
C31 T A 

K15 R4 Y2 M2 L2 
A2 2 G5 Q2 R K D * L M T G P 

R12 K5 A2 Y3 H2 S2 

T21 M4 F3 L2 V2 T 

ll! P10 R6 S2 K2 L Q 

Rl9 A 7 S4 L2 Q 

Y18 F13 VJ I 

F 14 Y14 H2 A N S 

Y32 F 

N26 S 03 P3 W3 L2 T2 K G R 
, K16 A.6 T2 E2 S2 R2 G H "V 

3 L9 Q7 K7 A2 F2 R2 M G T N 



vll 18 T3 D2 Q2 
Y31 

G27 S5 R 
G33 

C31 T A 

R 13 G9 K4 Q3 D2 



P 

C 

K 

A 

R 

I 



I 

R 

Y 
F 
Y 
N 



s 

s 

1 s 

X 

1 
1 
1 
1 
1 
1 
1 
s 



5 

s 5 
4 s 
s 5 
X x 
4 
\ 4 
4 
4 
4 
4 
x 

4 S 
s 
4 
s 



3 
3 
s 
3 
x 
s 
s 
> 3 

3 S 
2 3 



s s 
2 3 



s 
2 
s 

s 



s 



F H p R K 



P M 



C 

Q 

T 

F 

V 

Y 

G 

G 

C 

R 



x 
1 
s 
1 

X 

1 
1 



s 
s 
3 
s 
3 
s 

3 4 
3 L 

S ! 

3 
x 
3 
3 
X 
3 
s 



5 
s 
5 
s 
5 
s 
5 
\ s 



s 
2 
2 
2 
x 
2 
2 
x 
2 
s 



4 
s 

X 

s 



X 

5 
s 
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Table 34: continued. 



5 Number 





Res . 

XL 
# 


JJ1 J- J- . 

7\ 7\ rt 


tonLciiL. s 


RPTT 

J_> IT J. X 


i 

j- 


2 3 4 


5 




4 U 


Z 


POO A1 1 


A 


g 


3 


5 


10 


41 


— > 
3 


NzO Kll DZ 


XT 
In. 




A 
*± 


Q 
O 




42 


9 


All Ry o4 HZ JJ y iv IN 


fx. 




G 
O 


-J 




43 


2 


NJl CjZ 


NT 






Q 
O 




44 


J 


JNz ± Kll tv 


NT 






e 
o 




45 


2 


POO V 

r 3 z x 


TP 






o 
o 


15 


46 


<~> 

8 


Kz4 ILZ oz JJ ii V X K 


In. 






c: 
_j 




47 


2 


rp-i o PI/ 

i ± y o ±4t 






O 
O 


5 




4 8 


y 


/v± x i y Hi^t i z w z j_iz rc Jtx u 








o 
o 




A C\ 

4 y 


•"7 
/ 


HjXy Uo i-iZ yz xS-Z X ri 






2 


s 




O U 


b 


CjI D JJ1Z IjZ 1 v 1 ^ Iv 






g 


5 


2 0 


b 1 


± 




c 










DZ 


/ 


Kl j 1 V 11U J-i-3 Xii -5 ^ z n V 


M 




2 


5 




53 


8 


R2 1 Q3 hjZ riz Lz b K 1J 


p 
rc 




o 






54 


7 


T23 A3 V2 E2 I Y K 


T 






5 




55 


1 


C33 


C 






X 


25 


56 


8 


G15 V8 13 E2 R2 A L S 


G 










57 


8 


G19 V4 A3 P2 -2 R L N 


G 










58 


8 


All -10 P3 K3 S2 Y2 R F 


A 










59 


9 


-24 G2 Q E A Y S P R 












60 


6 


-28 Q R I G D 










30 


61 


3 


-31 T P 












62 


2 


-32 D 












63 


2 


-32 K 












64 


2 


-32 S 











35 s indicates secondary set 

x indicates in or close to surface but buried 

and/or highly conserved. 



Table 35: 
Distances from C 6 to 
Tip of Side Group 
in A 

Amino Acid type Dist 
A 0.0 



C (reduced) 


1 . 8 


D 


2 . 4 


E 


3 . 5 


F 


4.3 


G 




H 


4.0 


I 


2 .5 


K 


5.1 


L 


2 . 6 


M 


3.8 


N 


2 .4 


P 


2 .4 


Q 


3 .5 


R 


6.0 


S 


1.5 


T 


1 . 5 


V 


1 . 5 


W 


5.3 


Y 


5 . 7 



Notes: These distances were calculated for standard model 
parts with all side groups fully extended. 



478 

Table 36: Distances, BPTI residue. set #2 
Distances in A between C 6 

Hypothetical Cg was added to each Glycine. 





R17 




119 








A O *7 




pop 




T O Q 








1 Jt z 




XT'*. A. 




H*4 O 




119 


7 . 


7 






































Y21 


15 . 


1 


8 . 


4 


































A2 7 


22 . 


6 


17 . 


1 


12 . 


<-> 
2 






























G2 8 


26 . 


6 


20 . 


4 


13 . 


8 


5 . 


— > 

3 


























L2 9 


22 . 


5 


15 . 


8 


9 . 


6 


5 . 


1 


5 . 


2 






















Q31 


16 . 


1 


10 . 


4 


6 . 


8 


6 . 


8 


10 . 


6 


6 . 


8 


















T32 


11 . 


7 


5 . 


2 


6 . 


1 


12 . 


0 


15 . 


5 


10 . 


9 


5 . 


4 














V34 


5 . 


6 


6 . 


5 


11 . 


6 


17 . 


6 


2 1 . 


7 


18 . 


0 


11 . 


A 

4 


o 

O . 


2 










A4 8 


18 . 


5 


11 . 


0 


5 . 


4 


12 . 


6 


13 . 


3 


8 . 


4 


8 . 


8 


8 . 


3 


15 . 


<—> 

1 






E49 


22 . 


0 


14 . 


7 


8 . 


9 


16 . 


9 


16 . 


1 


12 . 


2 


13 - 


9 


13 . 


3 


19 . 


8 




5 


M52 


23 . 


6 


16 . 


3 


8 . 


6 


12 . 


2 


10 . 


3 


7 . 


6 


11 . 


3 


13 . 


2 


2 0 . 


0 


6 . 


2 


P9 


14 . 


, 0 


11 . 


3 


9 . 


0 


12 . 


2 


15 . 


4 


13 . 


3 


7 . 


9 


9 . 


2 


8 . 


7 


13 . 


9 


Til 


9 . 


. 5 


11 . 


. 2 


13 . 


, 5 


18 . 


, 8 


22 . 


5 


19 . 


8 


13 . 


5 


12 . 


1 


5 . 


7 


1 o 

18 . 


5 


K15 


7 . 


. 9 


14 . 


. 6 


20 . 


, 1 


27 , 


, 4 


31 . 


3 


27 . 


> 9 


21 . 


4 


18 . 


1 


10 . 


3 


24 . 


r~ 

D 


A16 


5 , 


. 5 


10 . 


. 1 


15 . 


. 9 


25 . 


. 2 


28 . 


5 


24 . 


. 6 


18 . 


6 


14 . 


5 


8 . 


6 


1 9 . 


o 
O 


118 


6 , 


. 1 


6 . 


. 0 


11 . 


. 2 


2 1 , 


. 3 


24 . 


. 4 


20 , 


, 2 


14 . 


7 


10 . 


4 


/ . 


0 


lb . 


0 


R2 0 


10 


. 6 


5 


. 9 


5 , 


. 4 


16 . 


. 0 


18 . 


. 5 


14 , 


. 6 


9 . 


8 


6 . 


, 9 


1 . 


8 


1U . 


2 


F22 


15 


. 6 


10 


. 9 


5 


. 6 


10 


, 5 


12 , 


. 8 


10 , 


. 3 


6 . 


2 


o 

O . 


. X 


1 U . 


o 
o 


1 U . 




N24 


19 


. 9 


14 


. 7 


9 


. 4 


4 


. 1 


7 . 


. 3 


6 , 


. 1 


4 . 


8 


1 0 . 


, 0 


14 . 


/ 


1 1 . 


A 


K26 


24 


. 4 


20 


. 1 


15 


. 2 


5 


. 4 


7 , 


. 7 


9 


. 8 


1 0 . 


, 1 


1 b , 


. o 


*1 Q 

iy . 


U 


1 / . 


U 


C3 0 


18 


. 9 


12 


. 1 


4 


. 6 


8 


. 8 


9 


. 5 


5 


. 3 


5 . 


, 9 


8 . 


. 2 


14 . 


9 


4 . 


9 


F3 3 


10 


. 8 


7 


.4 


7 


.7 


12 


.6 


16 


.4 


13 


. 0 


6 . 


. 6 


5 , 


.6 


5 . 


. 5 


12 . 


2 


Y3 5 


8 


.4 


7 


.4 


9 


.4 


18 


.4 


21 


.4 


17 


. 9 


12 . 


.2 


9 


. 5 


5 . 


. 8 


14 . 


4 


S47 


17 


. 6 


10 


.6 


6 


.6 


17 


.3 


17 


. 9 


13 


.4 


12 . 


.6 


10 


.4 


15 . 


. 9 


5. 


3 


D50 


20 


. 0 


13 


. 6 


7 


. 2 


17 


.2 


16 


. 8 


13 


. 5 


13 


.5 


12 


- 9 


17 , 


. 6 


7 . 


.6 


C51 


18 


. 9 


12 


.2 


4 


. 0 


12 


.1 


12 


.2 


8 


. 8 


8 


. 8 


9 


. 7 


15 


. 3 


5 . 


.4 


R53 


25 


.4 


18 


.6 


11 


. 0 


17 


.2 


15 


. 0 


13 


. 0 


15 


. 7 


16 


.7 


22 


. 3 


9, 


.7 


R3 9 


15 


.4 


16 


. 9 


17 


. 1 


24 


.9 


27 


.2 


24 


. 9 


20 


. 1 


18 


.7 


13 


. 8 


22 . 


.3 
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Table 36, continued. 
Distances in A between C s . 



Hypothetical C s was added to each Glycine. 





E4 9 




M52 




P9 




Til 




K15 




Alo 








Kz U 




r A A 




"MO A 
vtA ft 




M52 


6 . 


1 






































P9 


17 . 


7 


15 . 


5 


































Til 


22 . 


1 


21 . 


5 


7 . 


2 






























K15 


27 . 


5 


28 . 


7 


16 . 


,4 


9 . 


5 


























A16 


22 . 


2 


24 . 


2 


14 . 


, 9 


9 . 


8 


6 . 


2 






















118 


17 . 


4 


19 . 


5 


12 . 


. 2 


9 . 


5 


10 . 


4 


4 . 


9 


















R2 0 


13 . 


0 


13 . 


8 


8 , 


. 0 


9 . 


4 


14 . 


9 


10 . 


6 


6 . 


2 














F22 
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Table 37: vgDNA to vary BPTI set #2.1 CS£tX tbM0:OOJ 

|g|p|c|k|a|X| 
I 35 | 36 | 37 t 38 | 39 | 40 | 
5 ' - 1 CAC I CCT | GGG | CCC I TGC 1 AAA | GCG | qf k | 208 
| spacer | Apa I | 



|i|x|r|y|f|y|n|a|k| 
10 | 4l| 42 | 43 | 44 | 45 j 46 | 47 | 48 | 49 | 

| ATC | qf k | CGT 1 TAT | TTC | TAC I AAC | GCT | AAA | 235 

/ 3 ' = olig#27 72 nts 

+ ! + | + 

15 |x|g|x|c|q|t|f|x|y|g|g| 
| 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 
| qf k 1 GGt 1 qf k | TGC | CAG | ACC | TTc | qf k j TAC | GGT | GGT | 2 68 

olig#28= 3'- acg gtc tgg aag **m atg cca cca 
78 nts CSgQ t& MO: 

20 

Overlap =12 (7 CG, 5 AT) 

| c | r | a | k.| r | n| n | f | k | 
j 6l| 62 | 63 | 64 | 65 j 66 j 67 j 68 | 69 | 
2 5 | TGC | CGT | GCT | AAG | CGT j AAC | AAC j TTT j AAA j 2 95 

acg gca cga ttc gca ttg ttg aaa ttt 
I Esp I 1 



30 | s | X | e | d | c | m | 
| 70 j 71 | 72 | 73 j 74 | 75 j 

| TCT | qf k | GAG | GAT | TGC j ATG | C 322 
age **m etc eta acg tac gca ccc acc -5' 

| Sph I | spacer 1 

35 

k = equal parts of T and G; m = equal parts of C and A; 
q = (.26 T, .18 C, .26 A, and .30 G) ; 
f = (.22 T, .16 C, .40 A, and .22 G) ; 
* = complement of symbol above 

40 

Residue 40 42 50 52 57 71 

Possibilities 21 x 21 x 21 x 21 x 21 x 21 = 8.6 x 10 7 
Abundance x 10: 

of PPBD .768 .271 .459 .671 .600 .459 

45 Produce = 1.77 x 10" 8 

Parent = 1/(5.5 x 10 7 ) least favored = 1/(4.2 x 10 9 ) 

Least favored one-amino-acid substitution from PPBD present at 1 in 

1.6 x 10 7 
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Table 38: Result of varying set#2 of BPTI 2.1 

I 1 I e | 



| 291 301 

|CTC|GAG| 17 8 

I Ava I I 
| Xho I I 



|p|p|y|t|glplc|k|a|DI 
| 311 321 33! 341 35 1 36| 37 I 38 | 391 40| 
| CCG | CCA | TAT | ACT I GGG | CCC | TGC | AAA | GCG | GAT | 
I PflM I 1 



1 Apa 


I 1 


| Dra 


II 1 


| Pss 


I 1 



ii|Q|r|y|flyl n l a|k| 
| 41| 42| 431 44| 451 46| 47 I 48 | 49| 
| ATC | CAG | CGT | TAT | TTC | TAC | AAC | GCT I AAA | 

I E I g | l I c I q I t I f I s I y I g I g I 
50 51 521 53! 54| 55 I 561 57 | 58 I 591 60 1 
| GAG | GGC I CTG | TGC I CAG | ACC | TTT | TCG | TAC | GGT | GGT 

|c|r|a|k|r|n|n|f|k| 

I 611 62 1 63| 64| 65 I 66 1 67 | 68 I 69 1 
| TGC | CGT | GCT | AAG | CGT | AAC | AAC I TTT | AAA | 

I Esp I I 

ls|W|e|d|c|m|r|t|c|gl 
I 701 711 721 731 74| 751 76 1 77 | 78 I 79 I 
| TCG | TGG | GAA | GAT | TGC | ATG | CGT | ACC | TGC | GGT | 

I Sph I I 



208 



235 



268 



295 



325 



I g I a I 
| 801 811 
| GGC | GCC | 
I Bbe I 1 
I Nar I I 
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Table 39: vgDNA to vary set#2 BPTI 2.2 ISgQ lb MQH7tQ 

+ P^t^C* ^ ^U€ite ^Stj-Q ID MO' 17) 

1 9 I P I c | X | a I D | 
j 35 | 36 1 37 | 38 j 39 1 40 | 
5 5' - eg gca cgc I GGG | CCC | TGC | mr A 1 GCG 1 GAT | , 208 
1 spacer I Apa I | 

+ + + 

■ | X | Q | X | X | f | y | n | a | k | 

| 41| 42 j 43 j 44| 45 1 46 j 47 j 48 | 49 | 
10 1 rwA | C AG 1 rvk | TwT | TTC | TAC 1 AAC | GCT | AAA] 235 

+ + + 

|E|x|L|c|x|x|f|s|y|g|g| 

| 50 | 5l| 52 I 53 | 54| 55 j 56 j 57 j 58 | 59 | 60 | 
15 | GAG 1 qf k 1 CTG | TGC | qf k | qf k | TTT | TCG | TAC | GGT 1 GG T | 268 

fe^l nts olig#3 0 3'- g cca cca 

Overlap =15 (11 CG, 4 AT) ~ ~ 

20 /- 3- olig#29 94 nts l> gQ - ^ " ^ 

| c | r | a | k | r | n | n | f | k | 
| 61 | 62 | 63 | 64 j 65 j 66 j 67 | 68 | 69 | 

| TGC 1 CGT | GC T 1 AAG | CGT | AAC | AAC | TTT | AAA j 295 
acg gca cga ttc gca ttg ttg aaa ttt 
25 I Esp I | 

+ 

| s | W | X | d | c | m | 
| 70 | 7l| 72 | 73 | 74| 75 j 
| TCG | TGG | qf k | GAT | TGC | ATG | C 
30 age acc **m eta acg tac gcg acc tgc -5* 

1 Sph I] spacer | 

k = equal parts of T and G,\ v = equal parts of C, A, and G; 
m = equal parts of C and A; r = equal parts of A and G; 
3 5 w = equal parts of A and T; 

q = (.26 T, .18 C, .26 A, and .30 G) ; 
f = (.22 T, .16 C, .40 A r and .22 G) ; 
* = complement of symbol above 

40 Residue 38 41 43 44 51 54 55 72 

Possibilities 4x 4x 9x 2x21x21x21x21 

=6.2 x 10 7 

Abundance x 10 2.5 2.5 .833 5. .663 .397 .437 .602 
Product = 2.3 x 10~ 8 



45 



Parent = 1/(4.4 x 10 7 ) least favored = 1/(1.25 x 10 9 ) 

Least favored one -amino- acid substitution from PPBD present at 1 



in 1.2 x 10 7 
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Table 40: Result of varying set#2 of BPTI 2.2 



I 1 I e ! 
I 29| 30| 

|CTC|GAG| 17 8 

I Xho I I 



lplply|t|g|p|c|E|a|D| 
I 31| 32| 33| 34| 35 | 36 | 37 | 38 | 39 | 40 | 
I CCG | CCA | TAT | ACT | GGG | CCC | TGC | GAG | GCG | GAT | 208 

I PflM I I 

I Apa I | 

|V|Q|N|F|f|y|n|a|k| 
I 41| 42| 43| 44| 45| 46 I 47 | 48| 49| 

I GTT | CAG | AAT | TTT | TTC | TAC | AAC | GCT | AAA | 2 3 5 



|E|F|L|c|S|A|f|S|y|g|g| 
I 50| 51| 52| 531 541 55 | 56 | 57 | 58 | 59 | 60 | 
I GAG | TTT | CTG | TGC | TCT | GCT | TTT | TCG | TAC | GGT | GGT | 2 68 



Ic|r|a |k|r|n|n|f|k| 
I 61| 62| 63| 64| 65| 66| 67| 68| 69| 

I TGC | CGT | GCT | AAG | CGT | AAC | AAC | TTT | AAA | 2 95 

I Esp I | 



|s|W|Q|d|c|m|r|t|c|g| 
I 70| 71| 721 731 74| 75 I 76 I 77 | 78 I 79| 
I TCG | TGG | CAG | GAT | TGC | ATG | CGT | ACC | TGC | GGT | 325 

I Sph I| 



I g I a | 
I 80| 81| 
| GGC | GCC | 
I Bbe I | 
I Nar I | 



20 
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Table 41: vg DNA set #2. of BPTI 2.3 £S€Q ID MO: 1 

I 1 I e | 
| 29 | 30 | 

5 5'- eg age ctg | CTC 1 GAG | 178 

| spacer ] Xho I [ 

+ + + 

|p|X|y|X|g|p|c|E|a|X| 
10 j 31 j 32 j 33 I 34 j 35 j 36 j 37 j 38 j 39| 40 j 

| CCG | vmg | TAT [ wig [ GGG | CCC | TGC | GAG | GCG | qf k | 2 08 

+ 

|v|Q|N|X|f|y|n|a|k| 
15 j 4l| 42 j 43 j 44 j 45 j 46 j 47 j 48 j 49 | > 
| GTT 1 CAG | AAT | Tdk | TTC | TAC | AAC | GCc | AAg | -3 1 olig#33 71 nts CSfcflt |D MQl I M J 

67 gt| olig#34 3 ^ g atg ttg egg ttc * ' 



Overlap .3 (7 CG, 6 AT) 



+ + + + 

|X|F|X|c|S|x|f|x|y|g|g| 
j 50 j 51 j 52 | 53 | 54 j 55 j 56 | 57 j 58 j 59 | 60 j 
j vAG j TTT j nTk | TGC | TCT | qf k j TTT | qf k | TAC j GGT j GGT | 268 
2 5 btc aaa nam acg aga **m aaa **m atg cca cca 

| c | r | a | k | 
| 61 j 62 | 63 | 64 | 
j TGC j CGT j GCT j AAG j C 
30 acg gca cga ttc gcg acc ggc 
j Esp I | spacer [ 

k = equal parts of T and G; m = equal parts of C and A; 
w = equal parts of A and T; n = equal parts of A,C,G,T; 
35 d = equal parts A , G , T ; v = equal parts A^^; 

q = (.26 T, .18 C, .26 A, and .30 G) ; 
f = (.22 T, .16 C f .40 A, and .22 G) ; 
* = complement of symbol above 

40 Residue 32 34 40 44 50 52 55 57 

Possibilities 6x 6x21x 6x 3x 5x21x21= 

3 x 10 7 

Abundance x 10 

of PPBD 10/6 10/6 .545 10/6 10/3 30/8 .459 .701 

4 5 product = 1.01 x 10" 7 



parent 1/(1 x 10 7 ) least favored = 1/(4 x 10 e ) 

Least favored one-amino-acid substitution from PPBD present at 1 

in 3 x 10 7 
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Table 42: Result of varying set#2 of BPTI 2.3 

I 1 I e I 
| 29| 30| 

|CTC|GAG| 178 
I Ava I | 
I Xho I | 



lp|E|y|Q|g|p|c|E|a|A| 
I 31| 32| 33| 34| 35 I 36 I 37 | 38 | 39 | 4 0 | 

I CCG | GAG | TAT | CAG | GGG | CCC | TGC | GAG | GCG | GCT | 208 

I Apa I | 

|V|Q|N|W|f|y|n|a|k| 
I 411 421 431 44| 45 I 46 I 47 1 48 I 4 9 I 

I GTT | CAG | AAT | TGG | TTC | TAC | AAC | GCT | AAA | 235 



|Q|F|M|c|S|L|f|H|y|g|g| 
I 50| 51| 52| 53| 54| 55 | 56| 57 | 58 I 59 | 60 | 
I CAG | TTT | ATG I TGC I TCT | CTT | TTT | CAT | TAC I GGT I GGT I 268 



|c|r|a|k|r|n|n|f|k| 
I 61| 62| 63| 64| 65 | 66 | 67 | 68 | 69 | 

I TGC | CGT | GCT | AAG | CGT | AAC | AAC | TTT | AAA | 295 
I Esp I | 



|s|W|Q|d|c|m|r|t|c|g| 
I 70| 71| 72| 73| 74| 75 | 76 | 77 | 78 | 7 9 | 

I TCG | TGG | CAG | GAT | TGC | ATG I CGT | ACC | TGC | GGT | 325 

I Sph I| 



I g I a | 
I 801 811 
I GGC | GCC | 
I Bbe I | 
I Nar I | 
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Table 101a: VI II signal: :bpti: ;VIll-coat gene C £6Q >P / Iff^Q 

pbd modi 4 : 9 V 89 : Sequence cloned into pGEM-MBl ' 

pGEM-3Zf (-) [ Hin di] : : lacUVS Sacl/gene/ 

TrpA attenuator/ (Sai l) : : pGEM-3Zf ( - ) [ Hin di] I 



25 



5 » - (GAATTC GAGCTCGGTACCCGG GGATCC TCTAGAGTC) 
GGC tttaca CTTTATGCTTCCGGCTCG tataat GTG ! 
TGG aATTGTGAGCGcTcACAATT ! lacO-syrr 



! polylinker 
lacUV5 





gagctc AG (G) AGG 


CttaCT 


I Sac 1 ; 


Shine -Dalgarno seq . a 


10 


atg 


aag 


aaa 


tct 


ctg 


gtt 


ctt 


aag 


get 


age 


! 10, M13 leader 




gtt 


get 


gtc 


gcg 


acc 


ctg 


gta 


cct 


atg 


ttg 


! 20 <- eodon # 




tec 


ttc 


get 


cgt 


ccg 


gat 


ttc 


tgt 


etc 


gag 


! 30 




cca 


cca 


tac 


act 


ggg 


ccc 


tgc 


aaa 


gcg 


cgc 


! 40 




ate 


ate 


cgC 


tat 


ttc 


tac 


aat 


get 


aaa 


gca 


! 50 


15 


ggc 


ctg 


tgc 


cag 


acc 


ttt 


gta 


tac 


ggt 


ggt 


! 60 




tgc 


cgt 


get 


aag 


cgt 


aac 


aac 


ttt 


aaa 


teg 


! 70 




gec 


gaa 


gat 


tgc 


atg 


cgt 


acc 


tgc 


ggt 


ggc 


! 80 




gec 


get 


gaa 


ggt 


gat 


gat 


ccg 


gee 


aaG 


gcg 


! 90 




gec 


ttc 


aat 


tct 


ctG 


caa 


get 


tct 


get 


acc 


! 100 


20 


gag 


tat 


att 


ggt 


tac 


gcg 


tgg 


gec 


atg 


gtg 


! 110 




gtg 


gtt 


ate 


gtt 


ggt 


get 


acc 


ate 


ggg 


ate 


! 120 




aaa 


ctg 


ttc 


aag 


aag 


ttt 


act 


teg 


aag 


gcg 


! 130 




tct 


taa 


tga 


tag 


GGTTACC 


r 


BstEII 







AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT 
aTCGA - ! ( Sai l ghost) 

-3 ! ) ! pGEM polylinker 



terminator 



(GACCTGCAGGCATGCAAGCTT . 



30 



Notes : 

a Designed sequence contained AGGAGG, but sequencing indicates 
that actual DNA contains AGAGG. 



10 



15 



20 



25 



Table 101b: VIII -signal : :bpti 
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: VIII-c 



oat gene 



Bam HI- Sal l cassette, after insertion of Sai l linker 
in PstI site of pGEM-MBl . 
pGEM-3Zf (-) [ Hin di] : : lacUV5 Sad/gene/ 
TrpA attenuator/ (Sai l) : :pGEM-3Zf (-) [ Hin di] ! 
5 1 -GAATTC GAGCTC GGTACCCGG GGATCC TCTAGA GTC- ! BamHI 
GGC tttaca CTTTATGCTTCCGGCTCG tataat GTG ! lacUVS 



TGG aATTGTGAGCGcTcACAATT 



r 



lacO-symm operator 



gagctc AGAGG CttaCT 




! Sac I ; 


Shine-Dalgarno seq. 


atg 


aag 


aaa 


tct 


ctg 


gtt 


ctt 


aag 


get 


age 


! 10, M13 leade 


gtt 


get 


gtc 


gcg 


acc 


ctg 


gta 


cct 


atg 


ttg 


! 2 0 <- codon # 


tec 


ttc 


get 


cgt 


ccg 


gat 


ttc 


tgt 


etc 


gag 


. 30 


cca 


cca 


tac 


act 


ggg 


ccc 


tgc 


aaa 


gcg 


cgc 


[ 40 


ate 


ate 


cgC 


tat 


ttc 


tac 


aat 


get 


aaa 


gca 


. 50 


ggc 


ctg 


tgc 


cag 


acc 


ttt 


gta 


tac 


ggt 


ggt 


. 60 


tgc 


cgt 


get 


aag 


cgt 


aac 


aac 


ttt 


aaa 


teg 


. 70 


gec 


gaa 


gat 


tgc 


atg 


cgt 


acc 


tgc 


ggt 


ggc 


. 80 


gec 


get 


gaa 


ggt 


gat 


gat 


ccg 


gec 


aaG 


gcg 


. 90 


gec 


ttc 


aat 


tct 


ctG 


caa 


get 


tct 


get 


acc 


. 100 


gag 


tat 


att 


ggt 


tac 


gcg 


tgg 


gec 


atg 


gtg 


110 


gtg 


gtt 


ate 


gtt 


ggt 


get 


acc 


ate 


ggg 


ate 


. 120 


aaa 


ctg 


ttc 


aag 


aag 


ttt 


act 


teg 


aag 


gcg 


. 130 


tct 


taa 


tga 


tag 


GGTTACC 


i 


BstEII 







AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT 
aTCGA GACctgca GGTCGACC ggcatgc-3 1 

I Sail I 



! terminator 



10 



Table 102a: Annotated Sequence of gene LSC Gl \P MOM 
found in pGEM-MBl tft|i0 .,« L 



5 ' - (G GATCC TCTAGA GTC) GGC- 
from pGEM polyl inker 

tttaca CTTTATGCTTCCGGCTCG tataat GTGTGG- 
ZJs~ lacUV5 - 10 



35 



40 



M13/BPTI Jnct j Xho I 1 

I p I p I Y I T | G | P | C | K | A | R I 
W\ 32 33 34 35 36| 37 | 38 1 39| 40 
CCA cSUclACrlG^ICCCl^lAAAl^lCOC- 

I Pf lM I 1 I 1 JBSBH 11 . 

"|Apa I 1 1 

I Dra II 1 
I Pss I I 



number 



39 



59 



^ p TTftTCt AGCGcTcACAATT - 
lacO-symm operator 

15 gagctc AG (G) AGG & CttaCT- 

Sac i Shine-Dalgarno seq. 

IfM I K I K I S I L | V | L | K | A | S I 

ivIaIvIa|t|l|v|p|m|l.| 

1 \ Nru II J_J<PJl_ll 

I e I p I A I R I P I D | F | C | L | E j 

1 ' 1 lAccIH' I Ava J - 1 



197 



492 

Table 102a : Annotated Sequence 
of gene found in pGEM-MBl 
(continued) 



|I|I|R|Y|F|Y|N|A|K|A 
I 41 | 42 | 43 | 44 | 45 j 46 j 47 | 48 j 49 j 50 
ATC | ATC CGC TAT TTC TAC AAT GCT AAA GC 



226 



|G|L|C|Q|T|F | V | Y | G | G | 
10 | 51 j 52 | 53 j 54 j 55 | 56 j 57 | 58 j 59 j 60 | 
A | GGC | CTG | TGC | CAG | ACC | TTT | GTA j TAC j GGT j GGT j 
| Stu I | I Acc I 1 



Xca I 



257 



15 | C | R | A | K | R | N | N | F | K | 
| 61 j 62 | 63 | 64 j 65 j 66 j 67 j 68 j- 69 j 
| TGC | CGT j GCT | AAG | CGT j AAC j AAC | TTT j AAA j - 
I Esp I L 

20 |S|A|E|D|C|M|R|T|C|G| 
| 70 | 71 j 72 | 73 j 74 j 75 j 76 | 77 j 78 j 79 j 
| TCG | GCC | GAA | GAT | TGC | ATG j CGT | ACC j TGC j GGT j 
IXmalll 1 I Sph l| 



284 



314 



25 



BPTI/M13 boundary 



30 



35 



40 



45 



1 G | 


A | 


1 80 1 


81| 


|GGC| 


GCC|( 


| Bbe 


I 1 


| Nar 


I I 


1 P 1 


N | 


1 92 1 


93 | 


|TTC| 


AAT | 


1 E | 


Y | 


1 101 1 


102 | 


| GAG j 


TAT| 


1 A | 


M | 


| 108 


109 | 


| GCC 


ATG | 



E 



K 



Sfi I 



rjGCG|GCcj- 350 



94 



L 
95 



Q 
96 



A 
97 



S 

98 



A I 
99 



T 
100 



1 Hind 3 1 
I | G | Y | A | W | 



V 



BstX I 



V 



V 



V 



1 Nco I 1 



377 



398 



425 
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Table 102a : Annotated Sequence 
of gene found in pGEM-MBl 
(continued) 



| T | I | G | I | 
|117|118|119|120| 

| ACC | ATC | GGG | ATC | - 437 



|K|L|F|K|K|F|T|S|K|A| 
| 121 | 122 | 123 j 124 | 12 5 | 126 | 127 | 128 j 12 9 j 13 0 j 
| AAA | CTG | TTC | AAG | AAG | TTT | ACT | TCG j AAG | GCG | - 4 67 

lAsu II | 

I S | . | : | . | 
| 131 | 132 | 133 | 134 | 

|TCT|TAA|TGA|TAG| GGTTACC - 486 

BstE II 



AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 521 
terminator 



aTCGA (GACctgcaggcatgc) -3 ' 
( Sai l ) from pGEM polyl inker 



Notes : 

a Designed called for Shine-Dalgarno sequence, AGGAGG, 
but sequencing shows that actual constructed gene contains 
AGAGG. 

Note the following enzyme equivalences, 



Xma III 
Dra II 



= Eag I 

= EcoO109 I 



Acc III 
Asu II 



= BspM II 
= BstB I 
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Table 102b : Annotated Sequence of gene 

after insertion of Sai l linker l$ge^ ip M 0**1^2^ 

P rgV^iVi ^v«ngg S&Q lt> MQitql * 

nucleotide 
number 



5 ' - (GGATCC TCTAGA GTC) GGC- 
from pGEM polyl inker 



tttaca CTTTATGCTTCCGGCTCG tataat GTGTGG- 3 9 

-35 lacUVS -10 



aATTGTGAGCGcTcACAATT - 59 
lacO-symm operator 



gagctc AGAGG CttaCT- 77 

SacI Shine-Dalgarno seq. 



|fM" | K | K | S | L | V | L | K | A | S | 
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 
| ATG | AAG I AAA j TCT j CTG j GTT | CTT j AAG j GCT | AGC | 

| Af 1 II | Nhe I 1 



107 



V | A | V | A | T | L | V | P | M 1 L | 
11 | 12 | 13 | 14 | 15 | 16| 17 | 18 | 19 | 2 0 j 
GTT | GCT | GTC | GCG | ACC | CTG | GTA | CCT j ATG j TTG j 
1 Nru I | | Kpn I 1 



137 



|S|F|A|R|P|D|F|C]L|E 
| 21 j 22 | 23 | 24 | 25 j 2S \ 27 j 28 | 29 | 30 
| TCC | TTC | GCT | CGT | CCG | GAT | TTC j TGT j CTC | GAG 

t 1 AccIII | | Ava I 

M13/BPTI Jnct 



Xho I 



167 



1 p 1 


P | Y 


1 T 


G | P 


( 


| 31 


32 j 33 


1 34 


35 1 36 




|CCA 


CCA | TAC 
PflM 


| ACT 
I 


GGG | CCC 
1 


T< 
1 








Apa I 


II 








Dra II 




Pss I 



K 
38 



A | 
39 | 



R 
40 



BssH II 



197 
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Table 102b.: Annotated Sequence 
of gene after insertion of Sail linker 
(continued) 



|i|i|r|y|p|y|n|a|k|a 

| 41 | 42 j 43 | 44 | 45 j 46 j 47 j 48 [ 49 j 50 
ATC ATC CGC TAT TTC TAC AAT GCT AAA GC 



226 



10 



15 



20 



25 



30 



35 



40 



45 



|g|l|c|q|t|f|v| y 

j 51 1 52 | 53 j 54 j 55 j 56 j 57 j 58 
A | GGC | CTG | TGC | CAG | ACC | TTT | GTA | TAC 
I Stu I I Acc I 



Xca I 



G | G | 
59 | 60 | 
GGT GGT 



257 



| C | R 
| 61 | 62 
| TGC | CGT 



I S | A 
| 70 | 71 
| TCG|GCC 



a|k|r|n|n|f|k| 

63 j 64 j 65 j 66| 67 j 68 | 69 | 
GCT j AAG | CGT j AAC | AAC | TTT j AAA j 



284 



[Xmalll | 



e|d|c|m|r|t|c|g| 

72 | 73 | 74 j 75 j 76 j 77 j 78 j 79 | 
GAA j GAT j TGC j ATG j CGT | ACC j TGC j GGT | - 



314 



BPTI/M13 boundary 

g|a|a|e|g|d|d|p|a|k|a|a 

80 1 81 | 82 | 83 | 84 | 85 j 86 j 87 j 88 | 89 1 90 | 91 
GGC j GCC j GCT | GAA j GGT | GAT j GAT j CCG j GCC j AAG j GCG j GCC 
Bbe I 1 | Sf i I 



350 



Nar I 



F|N|S|L|Q|A|S|A|T| 
92 | 93 j 94 | 95 j 96 j 97 j 98 j 99 j 100 j 
TTC | AAT | TCT | CTG | CAA j GCT | TCT j GCT j ACC j ■ 

| Hind 3 | 

e|y|i|g|y|a|w| 

101 j 102 | 103 | 104 j 105 | 106 | 107 | 
GAG j TAT | ATT j GGT | TAC | GCG j TGG | - 

A | M | V | V | V 
|108|109|110|111|112|113|114|115|116 
GCC | ATG | GTG | GTG | GTT | ATC | GTT | GGT | GCT | - 

| BstX I | 

| Nco I | 



377 



398 



425 



V 
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Table 102b: Annotated Sequence 
after insertion of; Sai l linker 
(continued) 



I T | I | G | I | 
| 117 | 118 | 119 | 120 | 
| ACC | ATC | GGG | ATC j - 



|k|l|f|k|k|f|t|s|k|a| 

| 121 1 122 I 12 3 j 124 | 12 5 j 12 6 | 12 7 | 12 8 j 12 9 j 13 0 | 
j AAA | CTG | TTC j AAG j AAG | TTT j ACT | TCG j AAG j GCG j 

|Asu Il[ 

15 

I S | . | . | . | 

| 131 1 132 | 133 | 134 | 
| TCT j TAA | TGA j TAG j GGTTACC - 

Bst E II 

20 



AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 
terminator 

25 

aTCGA GACctgca GGTCGACC ggcatgc-3 1 

| Sail | 

Note the following enzyme equivalences, 

30 

Xma III = Eag I Acc III 

Dra II = EcoO109 I Asu II 
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Table 102 : Annotated Sequence 
of osp-ipbd gene 
(continued) 

5 Table 102c : Calculated properties of Peptide 
For the apoprotein 



Molecular weight of peptide = 16192 

10 Charge on peptide = 9 

[A+G+P] = 3 6 

[C+F+H+I+L+M+V+W+Y] = 48 

[D+E+K+R+N+Q+S+T+ . ] = 48 

15 For the mature protein 

Molecular weight of peptide = 13339 

Charge on peptide = 6 

[A+G+P] = 31 

20 [C+F+H+I+L+M+V+W+Y] = 37 

[D+E+K+R+N+Q+S+T+ . ] = 41 



Table 102d: Codon Usage 



25 



30 



35 



40 



45 



First 

Base 

t 



Second Base 



3 
5 
0 
1 

1 
1 
0 
5 

1 
5 
0 
4 

4 
1 
2 
2 



4 
1 
0 
2 

1 

1 
2 
2 

2 
5 
0 
0 

9 
5 
1 
5 



2 
4 
0 
0 

0 
0 

1 
1 

2 
2 
5 
7 

4 
0 
2 
2 



1 
5 
0 
1 

4 
2 
0 
0 

0 

1 

0 
0 

6 
2 
0 
2 



Third base 

t 

c 

a 

g 

t 
c 
a 

g 
t 

c 
a 

g 

t 
c 
a 

g 
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Table 102e: Amino-acid frequency- 
Encoded polypeptide 

5 



AA 


# 


AA 


# 


AA 


# 


AA 


# 


A 


20 


C 


6 


D 


4 


E 


4 


F 


8 


G 


10 


H 


0 


I 


6 


K 


12 


L 


8 


M 


4 


N 


4 


P 


6 


Q 


2 


R 


6 


S 


8 


T 


7 


V 


9 


W 


1 


Y 


6 



1 



Mature protein 

15 



AA 


# 


AA 


# 


AA 


# 


AA 


# 


A 


16 


C 


6 


D 


4 


E 


4 


F 


7 


G 


10 


H 


0 


I 


6 


K 


9 


L 


4 


M 


2 


N 


4 


P 


5 


Q 


2 


R 


6 


S 


5 


T 


6 


V 


5 


W 


1 


Y 


6 



842776.1 



Table 102f: Enzymes used 



to 



499 

manipulate BPTI-gp8 fusion 



SacI 
Aflll 
5 Nhe l 
Nrul 
Kjonl 
Acc III = 
Ava l 
10 Xhol 
PflMI 
BssHII 
Apa l 
Drall = 



15 



20 



Bsp MII 



Ecol09I 



StuI 

AccI 

Xcal 

Esp l 

Xmalll 

SphI 

Bbel 

Narl 



sfi i tsec? \D NiO -i^rj 



Hin di I I 
25 BstXI CSeQ tP NO: 1^3^ 

Ncol 

AsuII = Bst BI 

Bst EII 

Sail 



GAGCT 
C 



TTAAG 



G 1 CTAG C 
TCG_[CGA 
G GTAC | C 



T CCGGA 



(Same as PssI) 



C | yCGr G 
C | TCGA G 
CCAn nnn | nTGG 
G | CGCG C 
G GGCC | C 
rG GnC | Cy 
AGGj_CCT 
GT | mkA C 
GTAj_TAC 
GC 1 TnA GC 
C | GGCC G 
G CATG 1 C 
G GCGC | C (Supplier ?) 
GG CG | CC 

GGCCn nnn [ nGGCC 
A 1 AGCT T 
CCA nnnnn | nTGG 
C | CATG G 
TT | CGA A 
G | GTnAC C 
G I TCGAC 



(Supplier ?) 



- 500 - 



Table 103 : Annotated Sequence of osp-ipbd gene 

Underscored bases indicate sites of overlap between annealed 
synthetic, duplexes. 



5' - 

/GGC tttaca CTTTAT , GCTTCCGGCTCG tataat GTGTGG- 
lacUV5 



aATTGTGAGCGcTcACAATT- 
lacO-symm operator 



gagctc AG ( G ) /AGG CttaCT- 

Sac I Shine-Dalgarno seq. 



I f M | K |K|S|L|V|L|K|A|S| 
1112 |3|4|5|6|7|8|9|10| 
I ATG | AAG , | AAA | TCT | CTG | GTT | CTT | AAG I GCT | AGC | 

I Afl III Nhe I I 



|V|A|V|A|T|L|V|P|M| L | 
I 111 12| 13| 14| 15| 16| 17| 18| 19 | 20 | 
I GTT | GCT | GTC I GCG I ACC I CTG I GTA I CCT | ATG | T /TG | - 
I Nru I | | Kpn I| 



I S | F | A | R|P|D|F|C|L|E| 
I 21| 22| 23| 24| 25 | 26 1 27 | 28 I 29 1 30 I 
|TCC|TTC|GCT|CG , T | CCG | GAT | TTC | TGT | CTC | GAG | - 

t | AccIII | | Ava I 1 

M13/BPTI Jnct | Xho I | 
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Table 103 : Annotated Sequence 
of osp-ipbd gene 
(continued) 



|P|P|Y|T|G|P 
I 31 | 32 | 3 3 | 34 | 35 | 36 
| CCA I CCA | TAC | ACT | GGG | CCC 
| PflM I _[ 

I Apa I 



Dra II 



| Pss I 



C I K | A I R 
37 | 38 | 39 j 40 
TGC j AAA | GCG j CGC 
I BssH II 



| I | I | R | Y | F | Y | N | A | K | A 
| 4l| 42 j 43 j 44 | 45 | 46 j 47 | 48 j 4 9 j 50 
| ATC j ATC j CG /C [ TAT | TTC | TAC | AAT | GC , T | AAA | GC 



|G|L|C| Q|T|F|V|Y|G|G| 
| 51 j 52 j 53 j 54 j 55 j 56 j 57 j 58 j 59 j 60 j 
A | GGC | CTG j TGC | CAG | ACC | TTT j GTA j TAC j GGT j GGT j - 
1 Stu I 1 | Acc I 1 

| Xca I | 



| C | R | A | K | R | N | N | F | K | 
j 61 | 62 "j 63 | 64 j 65 j 66 j 67 | 68 j 69 j 
| TGC | CGT j GCT | AAG j CGT j /AAC | AAC 1 TTT [ AAA | - 
I Esp I | 

| s |a|e|d|c|m|r|t|c|g| 

j 70 j 7l| 72 j 73 j 74 j 75 | 76 | 77 j 78 j 79 | 
[TCG, j GCC | GAA | GAT | TGC j ATG | CGT | ACC | TGC | GGT | - 
| Xma I I I | 1 Sph I | 

BPTI/M13 boundary 

|g|a|a|e|g|d|d|p|a|k|a| a | 

j 80 j 81 j 82 j 83 j 84 j 85 j 86 j 87 j 88 j 8 9 | 90 j 91 j 
| GGC j GCC j GCT | GAA j GGT | GAT | GAT j CCG j GCC | AAG | GCG | G /CC | - 

| Bbe I | | Sf i I 

Nar I 



| F | N | S | L | Q | A | S | A | T | 
j 92 j 93 j 94 | 95 j 96 j 97 j 98 j 99 j 100 j 
1 TTC | AAT | TCT | CTG | C , AA j GCT j TCT j GCT j ACC j - 
| Hind 3 | 
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Table 103 : Annotated Sequence 
of osp-ipbd gene 
(continued) 



|E|Y|I|G|Y|A|W| 
| 101 | 102 I 103 | 104 j 105 | 106 j 107 j 
j GAG j TAT j ATT j GGT j TAC j GCG j TGG | - 



10 

| A | M | V | V | V | I | V | G | A | 
j 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 
| GCC | ATG | GTG | GTG j GTT | AT /C j GTT [ GGT ] GCT | - 

| BstX I [ 

15 | Nco I | 

I T | I | G | I | 
| 117 | 118 | 119 | 120 | 
1 ACC , j ATC | GGG j ATC j - 

20 



|k|l|f|k|k|f|t|s|k|a| 

j 121 j 122 j 12 3 j 12 4 j 12 5 j 12 6 j 12 7 j 12 8 j 12 9 j 13 0 I 
j AAA j CTG j TTC j AAG j AAG j TTT j ACT j TCG j AAG | GCG j - 
25 |Asu Il| 

I s | . | . | . | 
| 131 | 132 | 133 | 134 | 

| TCT | TAA j TGA | TAG j GGTT A/CC- 

30 BstE II 



AGTCTA AGCCC ,GC CTAATGA GCGGGCT TTTTTTTT- 
terminator 

35 



a / (TCGA) , -3 1 
(Sal I) 





CO 




<v 




'■a 




-H 








o 




<D 




rH 




o 








a 




o 




cn 




-H 


o 


rH 


LO 


O 




4-1 




o 












0) 




e 












-H 




i— 1 




fd 








g 








; a 




, o 








& 




!-h 




!C 




-H 




4-1 

■0) 




P 



o 

rH 

0) 
i — I 




o 

6 



I 

H 
E-i 
O 
O 

a 
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Table 107: In vitro transcription/translation 
analysis of vector-encoded 
signal : :BPTI :: mature VIII protein species 

5 31 kd species 5 14.5 kd species b 

No DNA (control) - c 

pGEN-3Zf(-) + 

pGEM-MB16 + 

pGEM-MB2 0 + + 

10 pGEM-MB2 6 + + 

pGEM~MB4 2 + + 

pGEM-MB4 6 ND ND 



Notes : 

15 a.) pre-beta-lactamase, encoded by the amp (bla) 

gene . 

b. ) pre-BPTI/VIII peptides encoded by the 
synthetic gene and derived constructs. 

c. ) - for absence of product; + for presence of 
2 0 product; ND for Not Determined. 
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Table 108, Vestern analysis' of in vivo 
signal = :BPTI •. mature VIII P 

pGEM-3Zf(-) _ 

pGEM-MBl6 VIII 

PGEM-MB20 VIII + + + +/ - 

10 pGEM-MB26 VIII 

pGEM-MB42 P^oA 

B) expressior^ 

15 p^iM^B42 +/ " 

Notes: . ■ .^bbit anti-BPTI polyclonal 

a) Analysis using rabbit ase _ conjug ated 



25 



synthetic gene. 

d) not present 

weakly present + ' 

present 

strong presence . • • • + 
very strong presence +++ 
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Table 10 9: M13 gene III lft fj^?-*^) 

1579 5'-GT GAAAAAATTA TTATTCGCAA TTCCTTTAGT 

1611 TGTTCCTTTC TATTCTCACT CCGCTGAAAC TGTTGAAAGT 
16 51 TGTTTAGCAA AACCCCATAC AGAAAATTCA TTTACTAACG 
5 1691 TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA 
1731 TGAGGGTTGT CTGTGGAATG CTACAGGCGT TGTAGTTTGT 
1771 ACTGGTGACG AAACTCAGTG TTACGGTACA TGGGTTCGTA 
1811 TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA 
18 51 GGGTGGCGGT TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT 
10 18 91 ACTAAACCTC CTGAGTACGG TGATACACCT ATTCCGGGCT 
1931 ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG 
1971 TACTGAGCAA AACCCCGCTA ATCCTAATCC TTCTCTTGAG 
2 011 GAGTCTCAGC CTCTTAATAC TTTCATGTTT CAGAATAATA 
2 0 51 GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG 
15 2 0 91 CACTGTTACT CAAGGCACTG ACCCCGTTAA AACTTATTAC 
2131 CAGTAC AC T C CTGTATCATC AAAAGCCATG TATGACGCTT 
2171 ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG 
2 211 CTTTAATGAG GAT C C ATT CG TTTGTGAATA TCAAGGCCAA 
22 51 TCGTCTGACC TGCCTCAACC TCCTGTCAAT GCTGGCGGCG 
2 0 22 91 GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG 
2331 CTCTGAGGGT GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA 
2371 GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT GATTTTGATT 
2411 ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA 
24 51 AAATGCCGAT GAAAACGCGC TACAGTCTGA CGCTAAAGGC 

2 5 24 91 AAACTTGATT CTGTCGCTAC TGATTACGGT GCTGCTATCG 

2 531 ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA 
2 571 TGGTGCTACT GGTGATTTTG CTGGCTCTAA TTCCCAAATG 
2 611 GCTCAAGTCG GTGACGGTGA TAATTCACCT TTAATGAATA 
2 651 ATTTCCGTCA AT AT T T AC C T TCCCTCCCTC AATCGGTTGA 

3 0 2 6 91 ATGTCGCCCT TTTGTCTTTA GCGCTGGTAA ACCATATGAA 

2 731 TTTTCTATTG ATTGTGACAA AATAAACTTA TTCCGTGGTG 
2771 TCTTTGCGTT TCTTTTATAT GTTGC C AC CT TTATGTATGT 
2 811 ATTTTCTACG TTTGCTAACA TACTGCGTAA TAAGGAGTCT 
2 851 TAATCATGCC AGTTCTTTTG GGTATTCCGT 
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Table 110: Introduction of Nar l into gene III 

A) w HCTpp^^^^ilW ^n^ag the signal peptide 

MK KLLFAI PL 
1 2 3 4 5 6 7 8 9 10 
1579 5 ! -GTG AAA AAA TTA TTA TTC GCA ATT CCT TTA 



/ Cleavage site 

V V P F Y S.H S^A E T V 
11 12 13 14 15 16 17 18 19 20 21 22 
160 9 GTT GTT CCT TTC TAT TCT CAC TCC GCT GAA ACT GTT - 3 

DA/A- i * gQ \D KlQy 2>3t 

B)^ III , portion encoding tne signal peptide with Nar l 
site 

mkkllfalpl 
1 2 3 4 5 6 7 8 9 10 
1579 5 1 -gtg aaa aaa tta tta ttc gca att cct tta 



/ cleavage site 

vvp fysGA^aetv 
11 12 13 14 15 16 17 18 19 20 21 22 
1609 gtt gtt cct ttc tat tct GGc Gcc get gaa act gtt-3 
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T 2 3 4 5 6 7 8 3 10 

*-4- a n- a ttc qca att cct tta 

5'-qtg aaa aaa.tta tta tec 3<- 

— gene III signal peptide — 



G A 
.7 18 

g« £t Tct ttc tat tct GQc_Gcc 



1 l £ 1? 14 15 16 17 - 



/ cleavage site 



15 



20 



|R|p|D|FlC|L|E| 
19-20 21 22 23 | 24 j 25 
I CGT \ CCG \ GAT | TTC | TGT \ CTC | I _ 

M13/BPTI Jnct - 1 

I TDflM T I J ■ 



25 



PflMl_____l 
— ' ]_Apa I 1 I 

| Dra II 1_ 



Pss I 



30 



40 



J\ 57 58 59 60 6l| 62 | 63 64 

tgc|cgt!gct|aag|cgt|aac|aac1ttt|aaa|- 



]_Esp_ 
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Table 111, continued 



S|A|E|D|C|M|R|T|C|G 
65| 66| 67| 68| 69 | 70 | 7 1 | 72 | 73 | 74 
TCG | GCC | GAA | GAT | TGC | ATG | CGT | ACC | TGC | GGT 
IXmalll I | Sph I | 



G | A 
75| 76| 
GGC | GCC | 
Bbe I | 
Nar I | 



BPTI/M13 boundary 



G A a e t v e s 
77 78 79 80 81 82 83 84 
GGc Gcc get gaa act gtt GAA AGT 

1651 TGTTTAGCAA AACCCCATAC AGAAAATTCA TTTACTAACG 

1691 TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA 

17 31 TGAGGGTTGT CTGTGGAATG CTACAGGCGT TGTAGTTTGT 
1771 ACTGGTGACG AAACTCAGTG TTACGGTACA TGGGTTCCTA 
1811 TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA 
1851 GGGTGGCGGT TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT 

18 91 ACTAAACCTC CTGAGTACGG TGATACACCT ATTCCGGGCT 
1931 ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG 
1971 TACTGAGCAA AACCCCGCTA ATCCTAATCC TTCTCTTGAG 
2011 GAGTCTCAGC CTCTTAATAC TTTCATGTTT CAGAATAATA 
2 051 GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG 
2 091 CACTGTTACT CAAGGCACTG ACCCCGTTAA AACT T AT T AC 
2131 C AG T AC AC T C CTGTATCATC AAAAGCCATG TATGACGCTT 
2171 ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG 
2211 CTTTAATGAG GATCCATTCG TTTGTGAATA TCAAGGCCAA 
2251 TCGTCTGACC TGCCTCAACC TCCTGTCAAT GCTGGCGGCG 
2291 GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG 
2 331 CTCTGAGGGT GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA 
2 371 GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT GATTTTGATT 
2 411 AT G AAAAG AT GGCAAACGCT AATAAGGGGG CTATGACCGA 
2 4 51 AAATGCCGAT GAAAACGCGC TACAGTCTGA CGCTAAAGGC 



516 



Table 111, continued 

2 4 91 AAACTTGATT CTGTCGCTAC TGATTACGGT GCTGCTATCG 

2 531 ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA 

2 571 TGGTGCTACT GGTGATTTTG CTGGCTCTAA TTCCCAAATG 

2 611 GCTCAAGTCG GTGACGGTGA TAATTCACCT TTAATGAATA 

2 651 ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA 

2 6 91 ATGTCGCCCT TTTGTCTTTA GCGCTGGTAA ACCATATGAA 

2 731 TTTTCTATTG ATTGTGACAA AATAAACTTA TTCCGTGGTG 

2 771 TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT 

2 811 ATTTTCTACG TTTGCTAACA TACTGCGTAA TAAGGAGTCT 

2 8 51 TAATCATGCC AGTTCTTTTG GGTATTCCGT 



517 



Table 112 : Annotated Sequence of 

Ptac: :RBS ( GGAGGAAAT AAA ) : : CS€JQ ID MO' lM/ ) 
VHI-signal : : mature-bpti : : mature -VI 1 1 -coat -protein 

gene Cs£e» to mo •■ Z3>C) 



5'-GGATCC actccccatcccc 

J L 

Bam HI 

ctg TTGACA attaatcatcgGCTCG tataat GTGTGG- 
-35 tac -10 

a AT T GT GAGC G c T c AC AAT T - 
lacO-symm operator 

GAGCTC T ggagga AATAAA- 

SacI Shine-Dalgarno seq. 

|fM|K|K|S|L|V|L|K|A|S| 
|1|2|3|4|5|6|7|8|9|10| 
I ATG | AAG | AAA | TCT | CTG | GTT | CTT | AAG | GCT | AGC | - 

I Afl III Nhe I I 



|V|A|V|A|T|L|V|P|M|L| 
I 111 121 131 14 1 15| 16| 17| 18| 19| 20 | 
I GTT | GCT | GTC | GCG | ACC | CTG | GTA | CCT | ATG I TTG | 
I Nru I | | Kpn I | 

|S|F|A|R|P| D|F|C|L|E| 
I 21| 22| 23| 24| 25 | 2 6 | 27 1 28 | 29 | 30 | 
I TCC | TTC | GCT I CGT | CCG | GAT | TTC I TGT | CTC | GAG | 
I | AccIII | 1 Ava I | 



M13/BPTI Jnct | Xho I | 

|P|P|Y|T|G|P|C|K|A|R| 
I 311 32| 331 34| 35 I 3 6 1 37 | 38 I 39 I 40 | 
I CCA | CCA | TAC | ACT | GGG | CCC | TGC | AAA | GCG | CGC | 

I PflM I j_ | | |BssH II| 

Apa I | | 



1 Dra 


II 1 


I Pss 


I 1 
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10 



15 



20 



25 
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Table 112 : Annotated Sequence of 
Ptac : : RBS (GGAGGAAATAAA) : : 
VI II -signal : : mature-bpti : : mature- VI II - coat -protein gene 



I | I | R 
41 | 42| 43 



G 
51 



L 
52 



C 
53 



Y 
44 



Q I 

54 I 



45 



T 
55 



| Stu I 



| C | R 
| 61 | 62 
| TGC | CGT 



(continued) 




Y | N | A | 


K | A 


46 | 47 | 48 


49| 50 


TAC | AAT | GCT 


AAAjGC 


F | V | .Y 


G | G 


56 | 57 | 58 


59| 60 


TTT | GTA | TAC 


GGT | GGT 


| Acc I 




1 Xca I 





a|k|r|n|n|f| 

63 | 64 I 65 I 66 | 67 | 68 j 
GCT | AAG | CGT | AAC j AAC j TTT j 
Esp I 1 



s|a|e|d|c|m|r|t| 

70 | 7l| 72 | 73 j 74 j 75 | 76 j 77 | 
TCG I GCC | GAA | GAT j TGC | ATG | CGT | ACC | 
Ixmalll 1 | Sph I 1 



K | 
69| 
AAA I 



C I 
78 | 
TGC 



G I 
79 | 

GGT I 



BPTI/M13 boundary 



G I A 
80 | 81 
| GGC | GCC 
1 Bbe I 
I Nar I 



a|e|g|d|d|p|a|k|a|a| 

82 I 83 | 84 1 85| 86 j 87 j 88 j 89 | 90 | 91 | 
GCT | GAA | GGT | GAT | GAT | CCG | GCC j AAG j GCG j GCC j 

I Sfi I I 



|F|N|S|L|Q|A|S|A|T| 
35 j 92 j 93| 94| 95 j 96 j 97 j 98 j 99|l00| 
| TTC | AAT | TCT | CTG | CAA | GCT | TCT | GCT | ACC j 

I Hind 3 I 



|E|Y|I|G|Y|A|W| 
40 j 101 j 102 | 103 | 104 | 105 | 106 | 107 | 
GAG TAT ATT GGT TAC GCG TGG 
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Table 112 : Annotated Sequence of 
Ptac : : RBS ( GGAGGAAAT AAA ) : : 
VIII - signal : : mature -bpti : : mature-VI,II -coat-protein gene 

(continued) 

5 

|a|m|v|v|v|i|v|g|a| 

1 108 1 109 I 110 | 111 j 112 | 113 | 114 | 115 | 116 j 
| GCC | ATG | GTG | GTG j GTT | ATC | GTT | GGT | GCT j - 

| BstX I j_ 

10 1 Nco I | 

| T | I | G | I | 
j 117 j 118 | 119 | 120 | 
| ACC | ATC | GGG | ATC | - 

15 

|k|l|f|k|k|f|t|s|k|a| 

1 121 I 122 j 123 I 124 | 125 | 126 | 127 | 12 8 j 129 j 130 j 
| AAA | CTG | TTC | AAG | AAG | TTT j ACT | TCG j AAG | GCG j - 

|Asu II| 

20 

| S | . | . | - | 

| 131 | 132 | 133 | 134 | 
| TCT j TAA | TGA | TAG j GGTTACC - 

Bst E II 

25 

AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 
terminator 



30 aTCGA GACctgca GGTCGACC ggcatgc-3 1 

| Sail | 



520 



Table 113 : Annotated Sequence of 
pGEM-MB42 comprising Ptac : : RBS (GGAGGAAATAAA) : : ^ 
phoA-signal : : mature-bpti : : mature -VI II -coat -protein 

D^A seque nce : Sect tbuo:av*2 



5 ! -GGATCC actccccatcccc 



BamHI 



ctg TTGACA attaatcatcgGCTCG tataat GTGTGG- 
-35 tac -10 



aATTGTGAGCGcTcACAATT - 
lacO-symm operator 

I M | K | Q | S | T | 
I 1 I 2 | 3 | 4 | 5 | 
GAGCTCCATGGGAGAAAATAAA | ATG | AAA | CAA | AGC | ACG | - 
I SacI 1 |< phoA signal peptide 



|I|A|L|L|P|L|L|F|T|P|V|T| 
I 6 | 7 | 8 | 9 | 10| 111 12| 13| 14| 15 t 16 1 17 | 
I ATC | GCA | CTC | TTA | CCG | TTA | CTG | TTT | ACC | CCT | GTG | ACA | - 
phoA signal continues 



(There are no residues 20-23.) 

|K|A|R|P|D|F|C|L|E| 
I 18| 19| 24| 25| 26| 27 I 28 I 29| 30 I 
| AAA | GCC | CGT | CCG | GAT | TTC | TGT | CTC | GAG | - 
phoA signal->T | AccIII | | Ava I 1 
phoA/BPTI Jnct | Xho I 1 
I < BPTI insert 
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Table 113 : Annotated Sequence of 

Ptac: :RBS ( GGAGGAAATAAA) : : 
phoA- signal : : mature-bpti : : mature -VI I I - coat -protein gene 

(continued) 



10 



15 



20 



25 



30 



P | P | Y | T | G | P | C | K | A | R | 
31 1 32 | 33 | 34 I 35 j 36 j 37 | 38 j 39 j 40 | 
CCA I CCA TAC I ACT GGG CCC TGC AAA GCG CGC I - 



PflM I 



1 Apa 


I II 


I Dra 


II 1 


| Pss 


I | 



|I|I|R|Y|F|Y 
j 41| 42| 43 | 44 | 45| 46 
| ATC | ATC | CGC j TAT j TTC j TAC 

|g|l|c|q|t|f 

j 51 j 52| 53 j 54 j 55 j 56 
A j GGC j CTG | TGC j CAG j ACC j TTT 
| Stu I 1 



| C | R | A | K | R | N 
| 61 | 62 | 63 j 64 j 65 j 66 
j TGC | CGT | GCT j AAG j CGT j AAC 
1 Esp I L 

| S | A | E | D | C | M 
j 70| 71 | 72 | 73 | 74 j 75 
| TCG | GCC | GAA | GAT | TGC j ATG 
IXmalll | I Sph 



iBssH II 



N I A | K | A | 
47| 48| 49 | 50| 
AAT | GCT | AAA | GC j 

V | Y | G | G | 
57- 1 58 | 59 | 60| 
GTA | TAC | GGT j GGT j 
Acc I 



Xca I 



N | F I K I 
67 | 68 | 69 | 
AAC TTT I AAA I 



R I T | C | G I 
76 | 77 j 78 j 79 | 
CGT ACC TGC I GGT 



BPTI insert- 



35 



40 



G | A 
80 | 81 
GGC | GCC 
I Bbe I 



BPTI/M13 boundary 
I 



\7 



A | E | G | D | D | P | A | K | A | A | 
82 | 83 | 84 | 85 | 8 6 1 87 j 88 j 89 j 90 ] 91 j 
GCT | GAA | GGT | GAT j GAT j CCG | GCC | AAG j GCG j GCC j 

I Sfi I I 



Nar I 



-- BPTI-->|< mature gene VIII coat protexn 



52 2 



10 



Table 113 : Annotated Sequence of 
Ptac : :RBS (GGAGGAAATAAA) : : 
phoA- signal : : mature-bpti : : mature-VIII -coat -protein gene. 

(continued) 

|F|N|S|L|Q|A|S|A|T| 
| 92 | 93 | 94 | 95 | 96 j 97 j 98 j 99|100| 
| TTC | AAT j TCT | CTG | CAA | GCT | TCT j GCT j ACC j - 

| Hind 3 | 

|E|Y|I|G| Y|A|W| 
j 101 j 102 j 103 j 104 1 105 j 106 j 107 | 
| GAG | TAT j ATT | GGT j TAC | GCG | TGG j - 

15 | A | M | V | V | V | I | V | G | A | 

1 108 j 109 j 110 j 111 j 112 j 113 j 114 j 115 | 116 | 
| GCC j ATG | GTG | GTG | GTT j ATC j GTT | GGT j GCT j - 
I BstX I | 
1 Nco I | 

20 

| T | I *| G | I | 
j 117 j 118 j 119 | 120 | 
| ACC j ATC | GGG j ATC | - 

25 | K | L | F | K | K | F | T | S | K | A | 
j 121 1 122 j 123 j 124 | 125 j 126 j 127 j 128 | 129 | 130 | 
| AAA | CTG | TTC | AAG j AAG j TTT j ACT j TCG | AAG j GCG | - 

|Asu II| 

30 | S | . | - | • | 
j 131 j 132 j 133 | 134 | 
| TCT | TAA j TGA j TAG j GGTTACC - 

BstE II 

3 5 AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 
terminator 



aTCGA 



GACctgca GGTCGAC - 3 ' 
1 Sail | 
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Table 114: Neutralization of Phage Titer Using 
Agarose -immobilized Anhydro- Trypsin 



Percent Residual Titer 
As a Function of Time (hours) 



Phage . Type 


Addition 


1 


2 


4 


MK-BPTI 


5 ill IS 


99 


104 


105 




2 ill IAT 


82 


71 


51 




5 Ml IAT 


57 


40 


27 




10 Ml IAT 


40 


30 


24 


MK 


5 Ml IS 


10 
6 


96 


98 




2 Ml IAT 


97 


103 


95 




5 Ml IAT 


11 
0 


111 


96 




10 Ml IAT 


99 


93 


106 



5 

Legend: 

IS = Immobilized streptavidin 
IAT = Immobilized anhydro -trypsin 



Table 115: Affinity Selection of MK-BPTI Phage 
on Immobilized Anhydro -Trypsin 



Percent of Total Phage 
Phage Type Addition Recovered in Elution Buffer 



MK-BPTI 



MK 



5 


Ml 


IS 


«l a 


2 


Ml 


IAT 


5 


5 


Ml 


IAT 


20 


10 


Ml 


IAT 


50 


5 


Ml 


IS 


«l a 


2 


Ml 


IAT 


<<1 


5 


Ml 


IAT 


<<1 


10 


Ml 


IAT 


«1 



Legend : 

IS = Immobilized streptavidin 
IAT = Immobilized anhydro- trypsin 
a not detectable. 
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Table 


130 


: Sampling of a 


. Library 


encoded 


by (NNK: 


Numbers 


of 


hexapeptides in each class 




total 




64,000,000 stop-free sequences. 


a can be 


one 


of [W ,M, F,Y, C, 


I, K,D,E, N 






<£> can be 


one 


of [P^J A, V,G] 








Q can be 


one 


of [S^L/R] 








aaaaaa 


= 


2985984 . 


faaaaa 


- - 


7464960. 


Qaaaaa 




4478976. 


3>$aaacx 


— 


7776000 . 


OQaaaa 


= 


9331200 . 


QQaaaa 


- 


2799360 . 






4320000 . 


<t<I>Qaaa 




7776000. 


<E>QQaaa 


_ 


4665600. 


QQQaaa 




933120. 




_ 


1350000. 


<£<I><I>Qaa 




3240000. 




_ 


2916000. 


QQQQaa 




1166400. 


QQQQaa 




174960. 






225000. 


tit >*r Mr aG\A 





675000. 


<DG><l>QQa 




810000 . 


Od>QQQa 




486000. 


$QQQQa 




145800 . 


QQQQQa 




17496. 






5625 . 






56250 . 






84375. 






67500. 


OOQQQQ 




30375. 


$QQQQQ 




7290. 


QQQQQQ 




729. 



OC>QQaa, for example, stands for the set of peptides having two 
amino acids from the a class, two from <£>, and two from Q 
arranged in any order. There are, for example, 729 = 3 6 
sequences composed entirely of S, L, and R. 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 

B. Probability that any given stop- free DNA 
5 sequence will encode a hexapeptide from a 

stated class. 







P 


% of class 


OtCtOtCiOtOt ... 


3 . 


364E-03 


(1 . 13E-07) 


Qoeototoiot . . . 


1 . 


682E-02 


(2 . 25E-07) 


QOlCtOtOtOt . . . 


1 . 


514E-02 


(3 . 38E-07) 




3 . 


505E-02 


(4 . 51E-07) 


<$>Qfyryry/-y 


6 . 


308E-02 


(6 . 76E-07) 


L a L KJL KJL \JL \JL . . * 


2 . 
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(1 01E-06) 




3 . 


894E-02 


(9 01E-07) 


dbcbOryfyry 

tit ^xr a £ vJt UC ... 


1 m 


051E- 01 


(1 35E-06) 


^tr A u 4& \*<C \JL \JL . . . 


9 . 


463E-02 


(2 03E-06) 


\L\L \L\JL\JL \JL • • • 


2 . 


839E-02 


(3 04E-06) 




2 . 


434E-02 


(1 80E-06) 


d><i><i)Orvry 

Mr^ir^ir 4£*-*^X • • * 


8 . 


762E-02 


(2 70E-06) 




1 m 


183E- 01 


(4 06E-06) 


cbOOOrvry 

Mr ii 4£ \L\JL \Jl ... 


7 . 


097E-02 


(G 08E-06) 


QOQQrvrv 


1 . 


597E-02 


(9 13E-06) 


{^>(^> <J> <J> c£> Qf 


8 . 


113E- 03 


(3 61E-06) 


T i T i atW ... 


3 . 


651E-02 


(5 41E-06) 


MrYVriiitUt ... 


\J m 


> / JL JL_i W 


i 1 F-Ofi) 

\ O • J — L XIj \J \J / 


<i>*QQQce . . . 


5 . 


914E-02 


(1.22E-05) 


3>QfiQ£2of . . . 


2 . 


661E-02 


(1 . 83E-05) 


QQQQQof. . . 


4 . 


790E-03 


(2 . 74E-05) 


<i><i><i><i><i><i> . . . 


1 . 


127E-03 


(7 .21E-06) 


. . . 


6 . 


084E-03 


(1 . 08E-05) 




1 . 


369E-02 


(1 . 62E-05) 


<£3>3>QQQ. . . 


1 . 


643E-02 


(2 .43E-05) 


<£3>QQQQ. . . 


1 . 


109E-02 


(3 . 65E-05) 


3>QQQQQ. . . 


3 . 


992E-03 


(5.48E-05) 




5 . 


988E-04 


(8 . 21E-05) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 



C. Number of different stop- free amino-acid 
5 sequences in each class expected for various 

library sizes 



Library 
total = 

pi noa 

LlaSS 


size 
9. 


1 . 

7446E+05 
Number 


0000E+06 
% sampl 

2- 
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ed = 1 
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.52 
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-0) 


<i)ci><£><i> rvrv 




24119 


. 9 


( 1.8) 






86442 


. 5 ( 


2 


7) 






115915 


. 5 


( 4.0) 






68853 


. 5 ( 


5 


-9) 


QQQQQfa . . 




15261 


. 1 


( 8.7) 






7968 


. 1 ( 


3 


.5) 


§>$$<i>£2oi . . 




35537 


.2 


( 5.3) 


3><i><i>QQQf. . 




63117 


. 5 ( 


7 


.8) 


<£3>QQQa?. . 




55684 


. 4 


( 11.5) 


3>QQQQaf. . 




24325 


. 9 ( 


16 


.7) 


QQQQQa. . 




4190 


. 6 


( 24.0) 






1087 


. 1 ( 


7 


.0) 






5767 


. 0 


( 10.3) 






12637 


.2 ( 


15 


.0) 






14581 


. 7 


( 21.6) 






9290 


. 2 ( 


30 


-6) 






3073 


. 9 


( 42.2) 






408 


. 4 ( 


56 


-0) 


Library 


size 




3 . 


0000E+06 














total = 


2 . 


7885E+06 


% sampled = 4 


. 36 










ctototoiotoi . , 




10076 


.4 


( .3) 
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r no 






■ 1) 
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o n n n a 


• 9 V 




- o; 






115256 


. 6 


( 2.7) 






3 0 910 7 


- 9 ( 


4 


. 0) 


QQQototot . 




275413 


. 9 


( 5.9) 


QQQofQfQf . . 




81392 


- 5 ( 


8 


.7) 






71074 


. 5 


( 5.3) 


<i><£><i>QQ:af . . 




252470 


.2 ( 


7 


.8) 






334106 


.2 


( 11.5) 


<i>QQQofo: . . 




194606 


. 9 ( 


16 


.7) 


QQQQaa . 




41905 


. 9 


( 24 . 0) 






23067 


. 8 ( 


10 


.3) 






101097 


. 3 


( 15.0) 


3>3>3>QQo:. . 




174981 


. o ( 


21 


• 6) 


§>&QQQoi . 




148643 


. 7 


( 30.6) 






61478 


- 9 ( 


42 


.2) 


QQQQQa . 




9801 


. 0 


( 56.0) 


3>3>3><l><l>3> 9 . 




3039 


. 6 ( 


19 


.5) 






15587 


. 7 


( 27.7) 


<M>*<I>QQ. . 




32516 


. 8 ( 


38 


• 5) 






34975 


. 6 


( 51.8) 






20215 


. 5 ( 


66 


-6) 






5879 


. 9 


( 80.7) 






667 


- 0 ( 


91 


.5) 
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Table 130: Sampling of a Library encoded by (NNK) 

(continued) 



Library size = 1.0000E+07 

5 



total 


- 8 1204E+06 


% 


sampled = 


12.69 








(yryrycyCYCY 


33455 9 f 


1 . 


1) 


<i>Q:Q!QfQ!Q! . 


1 66342 


. 4 1 
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2) 


\L\JLKJLKX\JL^JL • • 
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—> • 


3) 




349 6 8 S 
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O • 


6) 




Qft 1 6 

.. Jl lO 
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1 ft 
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r 9 
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; 23 


.7) 
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3) 


cbOOO/vrv 


531651 


.3 ( 


; 45 


.6) 


QQQ^aof . . 


104722. 3( 


59. 


9) 
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68111 
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: 30 
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41 . 


8) 
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[ 55 
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4) 
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5) 
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6) 


QQQQQQ . 


• . / Z o 


Q 
. O 


^ X u u 




juiijrary 


size = 3.0000E+07 












LOLdi — 


1 . 8633E+07 




% sampled = 


no i i 








/~\t r\t s\i f~\i r\i 


99247. 4( 


3 
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- o ; 




1712943 .0 ( 
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z d y Z O D D 
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f c: Q 
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41 


-8) 
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-] O Art/1 Ol 


. U 




• 6 ; 




2052433 . 0 ( 


70 


.4) 


QQQQotot . 


978420 


.5 


( 83 


.9) 


QQQQao; . 


163640 .3 ( 


93 


.5) 




148719 


. 7 


( 66 


.1) 




541755 . 7 ( 


80 


.3) 




738960 


. 1 


( 91 


.2) 




473377 . 0 ( 


97 


-4) 


<£QQQQaf . 


145189 


.7 


( 99 


.6) 




17491 . 3 ( 


100 


.0) 




13829 


. 1 


( 88 


-5) 




54058 . 1 ( 


96 


.1) 




83726 


. 0 


( 99 


.2) 




67454 . 5 ( 


99 


.9) 




30374 


. 5 


(100 


.0) 




7290 . 0 ( 


100 


.0) 




729 


.0 


(100 


.0) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 

Library size = 7.6000E+07 

5 

total = 3.2125E+07 % sampled = 50.19 
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<$>$$>Qota . . . 
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( *3 zr 
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4585163 


.0 ( 49. 


1) 


(~) C~)/^\y /"v /"v /"v 
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X / o J y JZ 


n 
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rfSrfSrfS /a/ /a/ /~y/ 


2566085 


. 0 ( 59. 


4) 


rT\i^if~) /*\y s\t s\i 

SPVizQfCtQf . . . 


c: 7 zr /i o qi 


. u 


V / . 


i ^ 
J- ; 


<i>QQa;Qfa . . 


4051713 


. 0 ( 86 . 


8) 


QQQaofo; . . . 


888584 


. 3 


( 95 . 


2) 




1127473 


. 0 ( 83 . 


5) 


3>3>3>QofQr . . . 


3023170 


. 0 


( 93. 


3) 


<M>QQq;o/ . . 


2865517 


.0( 98. 


3) 




1163743 


. 0 


( 99. 


8) 




174941 


. 0 (100 . 


0) 




218886 


. 6 


( 97 . 


3) 




671976 


. 9 ( 99 . 


6) 


<£<f><£QQaf. . . 


809757 


.3 


(100 . 


0) 




485997.5(100.0) 




<£QQQQar . . . 


145800 


. 0 


(100. 


0) 


QQQQQa. . 


17496.0(100.0) 




<£3><i><£<l><i> . . . 


15613 


. 5 


( 99. 


9) 




56248 


. 9 (100 . 


0) 


3>3><M>QQ. . . 


84375 


. 0 


(100. 


0) 




67500 


. 0 (100 . 


0) 


<£<£QQQQ. . . 


30375 


. 0 


(100. 


0) 


<i>QQQQQ . 


7290 


. 0 (100 . 


0) 




729 


. 0 


(100 . 


0) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 

Library size = 3.0000E+08 

5 

total = 5.2634E+07 % sampled = 82.24 



rv rv r\/ rv 


*J — > \J — > -L 




7 ) 




J OOO 1 J v 


. 0 ( 49 


1) 


O rv /"v /"V rv 






7) 


*±"4r UCUlLa ... 


57643 91 


. 0 ( 74 


1) 


H?\£UtUCUCUC ... 


O 1 U J 1 Z, D 


0 f ft£ 


ft ) 


OOa/a/a/a/ 


ZUU J f ~J ~j 






f\i f\l /*V/ 






-5 ) 


fT^(--T-s ( ) /^\» f\j 


7^/11 070 
/DtIj / O 


n ( Qft 




QQQoiOtot . . . 


4654972 


. 0 ( 99 . 


8) 


QQQaofQf . . . 


933018 


. 6 (100 


-0) 


$>$&$>aot . . . 


1343954 


.0( 99. 


6) 


$$$Qaot . . . 


3239029 


. 0 (100 


.0) 


$<f>QQofQf. . . 


2915985 


. 0 (100 . 


0) 


3>QQQofQ! . . . 


1166400 


. 0 (100 


.0) 


QQQQaot . . . 


174960 


.0 (100. 


0) 


&$<$&<&ot . . . 


224995 


. 5 (100 


.0) 


3><!><i><l>Qaf . . . 


674999 


. 9 (100 . 


0) 


<£<£3>QQof. . . 


810000 


. 0 (100 


.0) 


$$QQQ<y. . . 


486000 


. 0 (100 . 


0) 


<£QQQQo: . . . 


145800 


. 0 (100 


.0) 


QQQQQa. . . 


17496 


. 0 (100 . 


0) 




15625 


. 0 (100 


.0) 


$<£<i><£<££2 . . . 


56250 


. 0 (100 . 


0) 


<i>$><£$>QQ . . . 


84375 


. 0 (100 


.0) 


3>3>3>QQQ. . . 


67500 


. 0 (100 . 


0) 




30375 


. 0 (100 


-0) 




7290 


. 0 (100 . 


0) 


QQQfiQQ . . . 


729 


. 0 (100 


.0) 



Library size = 1.0000E+09 

10 

total = 6.1999E+07 % sampled = 96.87 



aactactot . . . 


2018278 . 


0 


( 67. 


6) 


QOiOtOtOtOt . . . 


6680917 


0 


( 89. 


5) 


Qaotototot . . . 


4326519 . 


0 


( 96. 


6) 


<$$>0i0t0t0t . . . 


7690221 


0 


( 98. 


9) 


QQaoiOia . . . 


9320389. 


0 


( 99. 


9) 


QQaaoia . . . 


2799250 


. 0 


(100 . 


0) 


$$$a<xa . . . 


4319475 . 


0 


(100 . 


0) 


<$<$Qaotot . . . 


7775990 


. 0 


(100 


0) 




4665600 . 


0 


(100. 


0) 


QQQototot . . . 


933120 


. 0 


(100 


0) 


§><i><i>3>Qfaf . . . 


1350000 


0 


(100. 


0) 


$$&Qo«y . . . 


3240000 


.0 


(100 


0) 


$$QQotcx . . . 


2916000 


0 


(100 , 


0) 


*QQQaof. . . 


1166400 


.0 


(100 


0) 


QQQQofa; . . . 


174960 


0 


(100 , 


0) 


&<$&<$&ot . . . 


225000 


. 0 


(100 


0) 


3><i><i><l>Qaf . . . 


675000 


0 


(100 , 


0) 


3>3>3>QQa. . . 


810000 


. 0 


(100 


0) 


3>3>QQQa. . . 


486000 


. 0 


(100 


0) 




145800 


. 0 


(100 


.0) 


QQQQQof. . . 


17496 


. 0 


(100 


0) 


. . . 


15625 


. 0 


(100 


.0) 


<M><£3>3>Q. . . 


56250 


. 0 


(100 


.0) 


$><$$<$QQ . . . 


84375 


. 0 


(100 


.0) 


$3>3>QQQ . . . 


67500 


. 0 


(100 


. 0) 


<£3>QQQQ . . . 


30375 


. 0 


(100 


.0) 


$>QQQQQ . . . 


7290 


. 0 


(100 


. 0) 




729 


. 0 


(100 


.0) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 

Library size = 3.0000E+09 

5 

total = 6.3890E+07 % sampled = 99.83 
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0 


(100 


0) 


QQQofOfQ! . . . 
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0 


(100 


0) 


4>3><£<i>Q!Q! . . . 


1350000 


0 


(100 


.0) 


3>3>3>Qa:a! . . . 


3240000 


0 


(100 


0) 


<3>4?QQa!Q! . . . 


2916000 


0 


(100 


0) 


$QQQo£Ot. . . 


1166400 


0 


(100 


0) 


QQQQotot . . . 


174960 


0 


(100 


0) 


<3? ( f> < f><l> < f > a' . . . 


225000 


0 


(100 


0) 


$<i>3>$Qa:. . . 


675000 


. 0 


(100 


.0) 


$**QQo; . . . 


810000 


0 


(100 


0) 


$<i>QQQcv. . . 


486000 


. 0 


(100 


.0) 


<£>QQQQa;. . . 


145800 


0 


(100 


0) 


QQQQQa . . . 


17496 


. 0 


(100 


.0) 


<^> ^> <^> 


15625 


0 


(100 


.0) 


$$>$>$$>Q. . . 


56250 


. 0 


(100 


.0) 


4><i><i><£>QQ . . . 


84375 


0 


(100 


.0) 


4>$>4?QQQ. . . 


67500 


. 0 


(100 


.0) 


4>3>QQQQ. . . 


30375 


. 0 


(100 


.0) 




7290 


. 0 


(100 


.0) 


QQQQQQ. . . 


729 


. 0 


(100 


.0) 
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Table 13 0, continued 



D. Formulae for tabulated quantities. 

Lsize is the number of independent transf ormants . 

31**6 is 31 to sixth power; 6*3 means 6 times 3. 

A = Lsize/ (31**6) 

of can be one of [WMFYCIKDENHQ . ] 

$ can be one of [PTAVG] 

Q can be one of [SLR] 

F0 = (12)**6 Fl = (12)**5 F2 = (12)**4 

F3 = (12)**3 F4 = (12)**2 F5 = (12) 

F6 = 1 



$>Qototaot 
QQQototot 

<i><l><l><i>3>ar 
3>QQQQq! 

total 
+ 



(l-exp(-A) ) 
5 * Fl * (1- 
Fl * 



= F0 * 

= 6 * 

= 6*3* 

= (15) * 5**2 * 

= (6*5)*5*3 *F2 

= (15) * 3**2 * 

= (20)*(5**3) * 



exp (-2*A) ) 
(1-exp (-3*A) ) 
F2 * (1-exp (-4*A) ) 

* (1-exp (-6*A) ) 
F2 * (1-exp ( -9*A) ) 
F3 * (1-exp (-8*A) ) 
(1-exp (-12*A) ) 



(60)* (5*5*3) *F3* u-cA P r^"^ 
(60) * (5*3*3) *F3* (1-exp (-18*A) ) 
(20)*(3)**3*F3* (1-exp (-27*A) ) 
(15)*(5)**4*F4* (1-exp (-16*A) ) 
(60) * (5) **3*3*F4* (1-exp (-24*A) ) 
(90)* (5*5*3*3) *F4* (1-exp (-36*A) ) 
(60)* (5*3*3*3) *F4* (1-exp (-54*A) ) 
(15)*(3)**4 * F4 * (1-exp (-81*A) ) 
(6)*(5)**5 * F5 * (1-exp (-32*A) ) 
30*5*5*5*5*3* F5* ( 1 -exp ( - 4 8 *A) ) 
60*5*5*5*3* 3 *F5* ( 1 -exp ( - 72 *A) ) 
60*5*5*3*3*3*F5* ( 1 -exp ( - 1 0 8 *A) ) 
30*5*3*3*3*3*F5* ( 1 -exp ( - 162 *A) ) 
6*3*3 *3*3*3*F5* (1 -exp ( -243 *A) ) 
5** 6 * (l-exp (-64*A) ) 
6*3*5**5* (i-exp (-96*A) ) 
15*3*3*5**4* (l-exp (-144 *A) ) 
20*3**3*5**3* (1-exp (-216*A) ) 
15*3**4*5**2* (1-exp (-324*A) ) 
6*3**5*5* (l-exp (-486*A) ) 
3**6* (1-exp (-72 9*A) ) 

OLOLOLOLOLOL + ^OtOtOLOtOL + QOLOLOLOLQi + ^OtOLOtOt + ^QOtOLOLOL 

QQatxcyQ! + ^^aotot + QQQololol + QQQotota. + QQQoioiot 
+ 

+ 

+ 



(The amino acids referred to in Table 13 0 need not 
be in sequence, but if they are, the sequences all 
have SEQ ID NO: 88) . 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 

X can be F, S, Y, C, L, P, H, R, I , T, N, V, A, D, G 

r can be L 2 , R 2 , S, W, P,Q,M,T,K,V,A,E,G 

Library comprises 8.55- 10 6 amino-acid sequences; 1.47-10 7 
DNA sequences. 

Total number of possible aa sequences= 8, 555, 625 



x LjVjPjTj A^R > G 3 F > Y > C^H > I > N,D 

6 V ,FyTjA } G ,W| Q|M, K,E,S 



The first, second, fifth, and sixth positions can 
hold x or S; the third and fourth position can hold 0 or 
Q- I have lumped sequences by the number of xs, Ss, 0s, 
and Qs . 

For example xx0QSS stands for: 

[ xx©QSS , xS0QxS, xS©QSx, SS0Qxx, Sx©QxS, Sx©QSx, 
xxQ0SS, xSQ0xS, xSQ0Sx, SSQ0XX, SxQ0xS, SxQ0Sx] 

The following table shows the likelihood that any 



particular DNA 


sequence will fall into one 


of the defined 


classes . 








Library size = 


1.0 


Sampling = 


= .00001% 


total 


1 . 0000E+00 


% sampled 


1. 1688E-07 




3 . 1524E-01 




2.2926E-01 




4 . 1684E-02 


xx©0xS 


1. 8013E-01 


xxOQxS 


1.3101E-01 




2 . 3819E-02 


xx00SS 


3. 8600E-02 


xx0QSS 


2. 8073E-02 


xxQQSS 


5. 1042E-03 


xS©0SS 


3. 6762E-03 


XS0QSS 


2 . 6736E-03 


xSQQSS 


4.8611E-04 


SS06SS 


1. 3129E-04 


SS0QSS 


9. 5486E-05 


SSQQSS 


1. 7361E-05 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 
(continued) 

The following sections show how many sequences 
of each class are expected for libraries of 
different sizes. 



Library size = 



1 .0000E+05 



10 



total 9.9137E+04 

Type Number % 

xx99xx 31416. 9( .7) 

xxQQxx 4112.4 ( 2.7) 

xxGQxS 12924.6 ( 2.7) 

XX06SS 3808. 1( 2.7) 

xxQQSS 483.7 ( 10.3) 

xSOQSS 253. 4( 10.3) 

sseess 12. 4( 10.3) 

SSQQSS 1.4 ( 35.2) 

Library size = 1.0000E+06 

total 9.2064E+05 

xxOOxx 304783. 9( 6.6) 

xxQQxx 36508. 6 ( 23.8) 

xxSQxS 114741. 4 ( 23.8) 

xxeeSS 33807. 7 ( 23.8) 

xxQQSS 3114.6 ( 66.2) 

xSGQSS 1631. 5 ( 66.2) 

sseess so.i( 66.2) 

SSQQSS 3.9( 98.7) 

Library size = 3.0000E+06 

total 2.3880E+06 

xxGOxx 855709. 5 ( 18.4) 

xxQQxx 85564.7 ( 55.7) 

xxGQxS 268917. 8 ( 55.7) 

xxeOSS 79234. 7 ( 55.7) 

xxQQSS 4522.6 ( 96.1) 

xSGQSS 2369. 0( 96.1) 

SSeeSS 116.3( 96. 1) 

SSQQSS 4.0 (100 . 0) 



fraction sampled = 1.1587E-02 

Type Number % 

xxGQxx 22771. 4 ( 1.3) 

xxeexS 17891. 8( 1.3) 

xxQQxS 2318.5 ( 5.3) 

xxOQSS 2732. 5 ( 5.3) 

xSOeSS 357. 8 ( 5.3) 

xSQQSS 43.7 ( 19.5) 

SS6QSS 8.6( 19.5) 



15 



fraction sampled = 1.0761E-01 

xxOQxx 214394. 0( 12.7) 

xxeexS 168452. 5 ( 12.7) 

xxQQxS 18383.8 ( 41.9) 

xxOQSS 21666. 6 ( 41.9) 

xSeeSS 2837. 3 ( 41.9) 

xSQQSS 198. 4 ( 88.6) 

SS8QSS 39. 0 ( 88.6) 



fraction sampled = 2.7912E-01 

xxGQxx 565051. 6 ( 33.4) 

xxGGxS 443969. 1 ( 33.4) 

xxQQxS 35281.3 ( 80.4) 

xx8QSS 41581. 5 ( 80.4) 

xSeeSS 5445. 2 ( 80.4) 

xSQQSS 223.7 ( 99.9) 

SS0QSS 43. 9( 99.9) 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 
(continued) 



Library size = 



8 . 5556E+06 



10 



15 



total 4.9303E+06 

xx09xx. . . 2046301. 0( 44.0) 

xxQQxx 138575. 9 ( 90.2) 

xxGQxS 435524. 3( 90.2) 

XX99SS 128324. 1 ( 90.2) 

xxQQSS 4703.6(100.0) 

xSOQSS 2463 . 8 (100 . 0) 

sseess 121 . 0 (100 . 0) 

SSQQSS 4.0 (100 . 0) 

Library size = 1.0000E+07 

total 5.3667E+06 

xxOGxx 2289093. 0 ( 49.2) 

xxQQxx 143467. 0 ( 93.4) 

xxOQxS 450896.3 ( 93.4) 

XX69SS 132853. 4( 93.4) 

xxQQSS 47 03.9(100.0) 

xSOQSS 24 64.0(100.0) 

sseess 121 . 0 (100 . 0) 

SSfiQSS 4.0 (100 . 0) 

Library size = 3.0000E+07 

total 7.8961E+06 

9.2291E-01 

xxeexx 4040589. 0( 86.9) 

xxQQxx 153 619.1 (10 0.0) 

xxGQxS 4 82 8 02. 9(100.0) 

xxGGSS 142 2 54 .4 (100 . 0) 

xxQQSS 4704.0(100.0) 

xSOQSS 24 64 . 0 (100 . 0) 

sseess 121 . 0 (100 . 0) 

SSQQSS 4.0 (100 .0) 



fraction sampled = 5.7626E-01 

xxGQxx 1160645. 0( 68.7) 

xx99xS 911935. 6( 68.7) 

xxQQxS 43480.7 ( 99.0) 

xxOQSS 51245. 1 ( 99.0) 

xseess 6710. 7( 99.0) 

xSQQSS 224.0 (100.0) 

SS6QSS 44 . 0 (100.0) 



fraction sampled = 6.2727E-01 

xxOQxx 1254877. 0 ( 74.2) 

xxeexS 985974. 9 ( 74.2) 

xxQQxS 43710.7 ( 99.6) 

xxGQSS 51516. 1( 99.6) 

xSeeSS 6746. 2 ( 99.6) 

xSQQSS 224.0(10 0.0) 

SS6QSS 44.0 (100.0) 



fraction sampled = 

xxOQxx 1661409. 0 ( 98.3) 

xxeexS 1305393. 0 ( 98.3) 

xxQQxS 43904 . 0 (100 .0) 

xxOQSS 51744. 0(100.0) 

xSeeSS 6776.0(100.0) 

xSQQSS 224.0 (100.0) 

SSGQSS 44 . 0 (100 .0) 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 
(continued) 



Library size = 



5 . 0000E+07 



total . . 

XX00XX. 

xxQQxx . 
xx0QxS . 

xxeess . 

xxQQSS . 
xSOQSS . 
SS00SS . 
SSQQSS . 



8.3956E+06 
4491779. 0( 96.6) 
153663.8(100.0) 
482943.4 (100.0) 
142295 . 8 (100 .0) 
4704 . 0 (100 .0) 
2464 . 0 (100 .0) 
121.0 (100.0) 
4.0 (100.0) 



fraction sampled = 9.8130E-01 

xxOQxx 1688387. 0 ( 99.9) 

xxOOxS 1326590. 0 ( 99.9) 

xxQQxS 43904 . 0 (100 .0) 

XX0QSS 51744 . 0 (100 . 0) 

XS00SS 6776 . 0 (100 . 0) 

xSQQSS 224. 0(100.0) 

SS0QSS 44.0 (100.0) 



Library size = 



1 . 0000E+08 



total 8.5503E+06 f 

xx00xx 4643063. 0( 99.9) 

xxfifixx 1536 64. 0(100.0) 

xx0QxS .4 82 944 . 0 (100 . 0) 

xxBGSS 1422 96 . 0 (100 . 0) 

xxQQSS 4704 . 0 (100 . 0) 

XS0QSS 24 64 . 0 (100 . 0) 

SS00SS 121 . 0 (100 .0) 

SSQQSS 4.0 (100 .0) 



(The amino acids referred to in 
be in sequence, but if they are 
have SEQ ID NO: 88) . 



raction sampled = 9.9938E-01 



xx0Qxx 169 0302 . 0 (100 . 0) 

xx0 0xS 13280 94. 0(100.0) 

xxQQxS 43904. 0(100.0) 

XX0C2SS 51744 . 0 (100 . 0) 

xSeOSS 6776 . 0 (100 . 0) 

xSQQSS 224 . 0 (100 . 0) 

SS0QSS 44 . 0 (100 . 0) 



Table 131 need not 
the sequences all 
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Table 132: Relative efficiencies of 
various simple variegation codons 



vgCodon 



Number of codons 
6 



#DNA/#AA 
[#DNA] 
(#AA) 



#DNA/#AA 
[#DNA] 
(#AA) 



#DNA/#AA 
[#DNA] 
(#AA) 



NNK 

assuming 
stops vanish 



8.95 
[2.86- 10 7 ] 
(3.2- 10 6 ) 



13 . 86 
[8.87-10 8 ] 
(6.4-10 7 ) 



21 .49 
[2.75-10 10 ] 
(1 .28-10 9 ) 



NNT 1.38 1.47 1.57 

[1.05-106] [1.68-10 7 ] [2.68-10 8 ] 

(7.59-10 5 ) (1.14-10 7 ) (1.71-10 8 ) 

NNG 2.04 2.36 2.72 

assuming [7.59-10 5 ] [1.14-10 6 ] [1.71-10 8 ] 

stops vanish (3.7-10 5 ) (4.83-10 6 ) (6.27-10 7 ) 
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Table 14 0. Affect of anti BPTI IgG on phage titer. 



Phage Strain Input +Anti- +Anti-BPTI Eluted 

BPTI +Protein A (a) Phage 

M13MP18 100 (b) 98 92 7 • 10~ 4 

BPTI. 3 100 26 21 6 

M13MB48 (c) 100 90 36 0.8 

M13MB48 (d) 100 60 40 2 . 6 



(a) Protein A-agarose beads. 

(b) Percentage of input phage measured as plaque 
forming 

units 

(c) Batch number 3 

(d) Batch number 4 



Table 141. Affect of anti-BPTI or protein A on 
phage titer . 



Strain 



Input 



No 

Addition 



+Anti- 
BPTI 



+Protein A 
(a) 



+Ant i - 

BPTI 
+Protein A 



M13MP18 
M13MB48 (b) 



100 (b) 
100 



107 
92 



105 
7 . 10" 



72 
58 



65 
<10~ 



(a) Protein A-agarose beads 

(b) Percentage of input phage measured as plaque 
forming 

units 

(c) Batch number 5 
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Table 142 Affect of anti-BPTI and non- immune serum 
on phage titer 



+Anti- +NRS +Anti- +NRS 

Strain Input BPTI (a) BPTI +Protein 

+Protein A A 

(b) 

M13MP18 100(c) 65 104 71 88 

M13MB48(d) 100 30 125 13 121 

M13MB48(e) 100 2 105 0.7 110 



(a) Purified IgG from normal rabbit serum. 

(b) Protein A-agarose beads. 

(c) Percentage of input phage measured as plaque 
forming units 

(d) Batch number 4 

(e) Batch number 5 
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Table 143. Loss in titer of display phage with 
anhydrotrypsin . 



Strain 


Anhy dr o t ryp sin 
Beads 


Streptavidin 
Beads 




Start 


Post 
Incubation 


Start 


Incubation 


M13MP18 


100 (a) 


121 


ND 


ND 


M13MB48 


100 


58 


100 


98 


5AA Pool 


100 


44 


100 


93 



(a) Plaque forming units expressed as a percentage 
of input . 

Table 144 . Binding of Display Phage to 
Anhy dr o t ryp sin. 

Experiment 1 . 

Strain Eluted Phage (a) 

M13MP18 0.2 (a) 

BPTI-IIMK 7.9 
M13MB4 8 11.2 



Relative to 
M13MP18 

1 . 0 

39.5 

56. 0 



Experiment 2 . 



Strain Eluted Phage (a) Relative to 

M13mpl8 

M13mpl8 0.3 1.0 

BPTI-IIIMK 12.0 40.0 

M13MB56 17.0 56.7 



(a) Plaque forming units acid eluted from beads, 
expressed as a percentage of the input. 
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Table 145. Binding of Display Phage to 
Anhydrotrypsin or Trypsin. 



Strain 


Anhydrotrypsin Beads 


Trypsin 


Beads 




Eluted 
Phage 
(a) 


Relative 

Binding (b) 


Eluted 
Phage 


Relative 
Binding 


M13MP18 


0.1 


1 


2 .3xl0" 4 


1.0 


BPTI-IIIMK 


9.1 


91 


1.17 


5x103 


M13.3X7 


25 . 0 


250 


1.4 


6xl0 3 


M13 .3X11 


9.2 


92 


0 .27 


1 . 2xl0 3 



5 (a) Plaque forming units eluted from beads, 
expressed as a percentage of the input . 

(b) Relative to the non-display phage, M13MP18. 

Table 146. Binding of Display Phage to Trypsin or 
10 Human Neutrophil Elastase. 



Strain 


Trypsin Beads 


HNE 


Beads 




Eluted 
Phage 
(a) 


Relative 

Binding (b) 


Eluted 
Phage 


Relative 
Binding 




M13MP18 


5x1 0" 4 


1 


3xl0" 4 


1 . 0 


BPTI-IIIMK 


1 . 0 


2000 


5xl0" 3 


16 . 7 


M13MB48 


0 . 13 


260 


9xl0" 3 


30 . 0 


M13 . 3X7 


1 . 15 


2300 


lxlO" 3 


3 . 3 


M13 . 3X11 


0.8 


1600 


2xl0" 3 


6.7 


BPTI3 .CL 


lxlO" 3 


2 


1 4 • 1 


1.4xl0 4 



(c) 



(a) Plaque forming units acid eluted from the beads, 
expressed as a percentage of input . 

15 (b) Relative to the non-display phage, M13MP18. 

(c) BPTI-IIIMK (K15L MGNG) 
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Table 155 

Distance in A between alpha carbons in octapeptides : 



Extended Strand: angle of C a l-C a 2-C a 3 = 138 ( 



10 





1 

_L 


2 




•3 


4 










7 


Q 
o 


1 
























2 


3 . 8 






















3 


7 . 1 


3 . 8 




















4 


10 . 7 


7. 1 


3 


. 8 
















5 


14 . 2 


10. 7 


7 


. 1 


3 . 8 














6 


17.7 


14'. 1 


10 


. 7 


7 . 1 


"5 ft 












7 


21.2 


17 . 7 


14 


. 1 


10 . 6 


/ . U 




Q 
. O 








8 


24 . 6 


20 . 9 


17 


. 5 


13 . 9 


in a 
1U . o 


•-7 
/ 


. U 


o 

-3 


Q 
. O 




Reverse turn 


. between 


residues 


4 and 


[~ 












1 


2 




3 


4 


5 




6 




7 


8 


1 
























2 


3.8 






















3 


7 . 1 


3 . 8 




















4 


10-6 


7 . 0 


3 . 


8 
















5 


11 . 6 


8 . 0 


6 . 


1 


3 . 8 














6 


9.0 


5 . 8 


5 . 


5 


5 . 6 


3.8 












7 


6.2 


4 . 1 


6 . 


3 


8 . 0 


7 . 0 


3 


. 8 








8 


5 . 8 


6 . 0 


9 . 


1 


11 . 6 


10 . 7 


7 


. 2 


3 


. 8 





Alpha helix: angle of Cal-C^ -C a 3 = 93° 



1 

2 3.8 



3 


5 . 5 


3 . 8 


















4 


5 . 1 


5.4 


3 


. 8 














5 


6 . 6 


5.3 


5 


. 5 


3 


. 8 










6 


9.3 


7 .0 


5 


. 6 


5 


.5 


3 . 8 








7 


10 .4 


9 .3 


6 


. 9 


5 


.4 


5 . 5 


3 


. 8 




8 


11 .3 


10 . 7 


9 


. 5 


6 


. 8 


5 . 6 


5 


. 6 


3 . 8 



15 



Table 156 
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Distances between alpha carbons in closed mini- 
proteins of the form disulfide cyclo (CXXXXC) 



Minimum distance 



1 

2 3.8 - 

3 5.9 3.8 

4 5.6 6.0 3.8 

5 4.7 5.9 6.0 3.8 

6 4.8 5.3 5.1 5.2 3.8 



10 



15 



Average distance 



1 

2 3.8 

3 6.3 3.8 

4 7.5 6.4 3.8 

5 7.1 7.5 6.3 3.8 

6 5.6 7.5 7.7 6.4 3.8 



Maximum distance 



1 

2 3.8 

3 6.7 3.8 

4 9.0 6.9 3.8 

5 8.7 8.8 6.8 3.8 

6 6.6 9.2 9.1 6.8 3.8 
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Table 160: pH Profile of BPTI-III MK phage and 
EpiNEl phage binding to Cat G beads. 



5 BPTI-IIIMK (BPTI has SEQ ID NO: 44) 



pH Total pfu in Fraction Percentage of Input 

7 3.7xl0 5 3.7xl0" 2 

6 3.1x10 s 3.1xl0~ 2 

5 1.4xl0 5 1.4xl0~ 2 

4.5 3.1xl0 4 3.1xl0" 3 

4 7.1xl0 3 7.1xl0" 4 

3.5 2.6xl0 3 2.6xl0" 4 

3 2.5x103 2.5xl0~ 4 

2.5 8.8xl0 2 8.8xl0" 5 

2 .6xl0 2 7.6xl0" 5 
(total input = lxlO 9 phage) 



EpiNEl (EpiNEl has SEQ ID NO: 51) 



7 


2 . 5xl0 5 


1 . lxlO" 2 


6 


6 .3xl0 4 


2 . 7xl0" 3 


5 


7 .4xl0 4 


3 . lxlO" 3 


4 . 5 


7 . lxlO 4 


3 . 0xl0~ 3 


4 


4 . lxlO 4 


1 . 7xl0~ 3 


3 . 5 


3 . 3xl0 4 


1 .4xl0" 3 


3 


2 .5xl0 3 


1 . lxlO" 4 


2 .5 


1 .4xl0 4 


5 . 7xl0" 4 


2 


5 .2xl0 3 


2 . 2xl0~ 4 



(total input = 2.35x10 phage). 
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TABLE 201 

Elution of Bound Fusion Phage from Immobilized 

Active Trypsin 



5 



Type of Buffer Total Plaque - 

Phage Forming Units 

Recovered in 
Elution Buffer 



Percent of Ratio 
Input Phage 
Recovered 



BPTI-III 


MK 


CBS 


8 


. 80 • 10 7 


4 . 7 


• 10" 1 


1675 


MK 




CBS 


1 


.35-10 6 


2.8 


•10" 4 




BPTI-III 


MK 


TBS 


1 


.32 • 10 8 


7.2 


•10" 1 


2103 


MK 




TBS 


1 


.48 • 10 6 


3.4 


•10" 4 





The total input for BPTI-III MK phage was 1.85-10 
plaque -forming units while the input for MK phage 
was 4.65-10 11 plaque -forming units. 
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5 





TABLE 


202 




Elution of 


BPTI-III MK and 


BPTI (K15L) - 


-III MA Phage 


from Immobilized 


Trypsin and 


HNE 


Type of 


Immobilized 


Total 


Percentage 


Phage 


Protease 


Plaque- 


of Input 






Forming 


Phage 






Units in 


Recovered 






Elution 








Fraction 




BPTI-III 


Trypsin 


2.1- 10 7 


4 . 1 • 10" 1 


MK 








BPTI-III 


HNE 


2.6- 


5 • 10~ 3 


MK 








BPTI (K15L) - 


Trypsin 


5.2- 10 4 


5 • 10" 3 


III MA 








BPTI (K15L) - 


HNE 


1.0- 10 6 


1.0-10" 1 


III MA 









The total input of BPTI-III MK phage was 5.1-10 9 pfu 
and the input of BPTI (K15L) -III MA phage was 9.6-10 8 
pfu . 
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TABLE 2 03 



Effect of pH on the Disociation of 
Bound BPTI-III MK and 
5 BPTI (K15L) -III MA Phage from Immobilized HNE 







BPTI - I I I 


MK 


BPTI (K15L) 


-III MA 




PH 


Total Plaque 
Forming Units 
in Fraction 


% 

of Input 
Phage 


Total Plaque- 
Forming Units 
in Fraction 


% 

of Input 
Phage 


7 


. 0 


5.0* 10 4 


2 • 10 3 


1.7- 10 5 


3 .2 • 10 2 


6 


. 0 


3 . 8 • 10 4 


2 • 10" 3 


4 .5- 10 5 


8.6- 10" 2 


5 


. 0 


3.5- 10 4 


1 • 10" 3 


2.1- 10 6 


4.0- 10" 1 


4 


. 0 


3.0-10 4 


1 • 10" 3 


4.3- 10 6 


8.2- 10" 1 


3 


. 0 


1.4-10 4 


1 • 10" 3 


1.1- 10 6 


2.1- 10" 1 


2 


. 2 


2.9- 10 4 


1 • 10" 3 


5.9- 10 4 


1 . 1 • 10" 2 



Percentage of Percentage of 

Input Phage = 8.0- 10" 3 Input Phage = 1.56 
10 Recovered Recovered 



The total input of BPTI-III MK phage was 
0.030 ml x (8.6-10 10 pfu/ml) = 2.6-10 9 . 

15 

The total input of BPTI (K15L) -III MA phage was 
0.030 ml x (1.7-10 10 pfu/ml) = 5.2-10 8 . 

Given that the infectivity of BPTI (K15L) -III MA 
20 phage is 5 fold lower than that of BPTI-III MK 

phage, the phage inputs utilized above ensure that 
an equivalent number of phage particles are added to 
the immobilized HNE. 
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TABLE 2 04 



Effect of Mutation of Residues 39 to 42 of BPTI 
on the ability of BPTI (K15L) -III MA to Bind to 
5 Immobilized HNE 



pH 



BPTI (K15L) -III MA 



BPTI (K15L,MGNG) -III MA 



Total 
Plaque 
Forming 
Units 



% Input 



Total 
Plaque - 
Forming 
Units 



Input 



7 


. 0 


3 . 


0 • 


10 5 


8.2- 10~ 2 


4.5- 10 5 


1 


. 63 • 10" 1 


6 


. 0 


3 . 


6 • 


10 5 


1.00- 10" 1 


6.3- 10 5 


2 


.21 • 10" 1 


5 


. 5 


5 . 


3 • 


10 5 


1.46- 1CT 1 


7.3- 10 5 


2 


. 64 • 10" 1 


5 


. 0 


5 . 


6 • 


10 5 


1 . 52 • 1CT 1 


8.7- 10 5 


3 


. 16 • 10" 1 


4 


. 75 


9. 


9- 


10 5 


2.76-10" 1 


1.3- 10 6 


4 


. 60 • 10" 1 


4 


. 5 


3 . 


1 * 


10 5 


8.5- 10" 2 


3.6- 10 5 


1 


.30-10" 1 


4 


.25 


5 . 


2 - 


10 5 


1 .42 • 10" 1 


5.0- 10 5 


1 


. 80 * 10" 1 


4 


. 0 


5 . 


1 • 


10 4 


1.4- 10" 2 


1.3- 10 5 


4 


. 8 • 10" 2 


3 


. 5 


1 . 


3 • 


10 4 


4 • 1CT 3 


3.8- 10 4 


1 


.4 -10" 2 



10 



Total 

Percentage = 1.00 
Recovered 



Total 

Percentage = 1.80 
Recovered 



The total input of BPTI (K15L) -III MA phage was 
0.030 ml x (1.2-10 10 pfu/ml) = 3.6-10 8 pfu. 



15 The total input of BPTI (K15L, MGNG) - III MA phage was 
0.030 ml x (9.2-10 9 pfu/ml) = 2.8-10 8 pfu. 
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TABLE 2 05 



Fractionation of a Mixture of 
BPTI-III MK and 
5 BPTI (K15L,MGNG) -III MA Phage 

on Immobilized HNE 



BPTI-III Mk BPTI (K15L,MGNG) -III MA 



pH 


Total 
Kanamycin 
Transducing 
Units 


% 

of Input 






Total 

Ampicillin 

Transducing 

Units 




% 

of Input 


7. 0 


4.01- 10 3 


4.5- 10" 3 


1 


.39 


•10 5 


3 


. 13 • 10" 1 


6. 0 


7 . 06 • 10 2 


8 • 10~ 4 


7 


. 18 


•10 4 


1 


. 62 - 10" 1 


5.0 


1 . 81 • 10 3 


2.0- 10" 3 


1 


.35 


-10 5 


3 


. 04 • 10" 1 


4 . 0 


1.49- 10 3 


1.7* 10" 3 


7 


.43 


•10 5 


1 


. 673 



10 The total input of BPTI-III MK phage was 

0.015 ml x (5.94 -10 9 kanamycin transducing units/ml) 
= 8.91-10 7 kanamycin transducing units. 

The total input of BPTI (K15L, MGNG) - III MA phage was 
15 0.015 ml x (2.96-10 9 ampiciliin transducing units/ml) 
= 4.44 -10 7 ampicillin transducing units. 
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TABLE 2 06 



Characterization of the Affinity of 
5 BPTI (K15V,R17L) -III MA Phage for Immobilized HNE 







BPTI (K15V, 


R17L) -III MA 


BPTI (K15L, MGNG) 


-Ill MA 






Total Plaque- 
Forming Units 
Recovered 


Percentage 
of Input 
Phage 


Total Plaque- 
Forming Units 
Recovered 


Percentage 
of Input 
Phage 


7 . 


0 


3 . 19 - 10 6 


8.1- 10" 2 


9 .42 • 10 4 


4.6- 10" 2 


6 . 


0 


5 .42 -10 6 


1.38-10' 1 


1.61- 10 5 


7.9- 10" 2 


5 . 


0 


9 . 45 * 10 6 


2.41- 10" 1 


2 . 85 * 10 5 


1 . 3 9 • 10' 1 


4 . 


5 


1.39-10 7 


3 .55-10" 1 


4 . 32 • 10 5 


2.11* 10" 1 


4 . 


0 


2 . 02 • 10 7 


5 . 15 • 10" 1 


1 . 42 • 10 5 


6.9- 10" 2 


3 . 


75 


9.20-10 6 


2.35- 10" 


- 


- 


3 . 


5 


4 . 16 - 10 6 


1 . 06 • 10" 1 


5.29- 10 4 


2.6- 10" 2 


3 . 


0 


2.65-10 6 


6.8-10" 2 










Total Input 
Recovered 


= 1 . 73 


Total Input 
Recovered 


0 . 57 


10 




Total input of BPTI (K15V, R17L) - 
0.040 ml x (9.80-10 10 pfu/ml) = 


III MA phage was 
3 . 92 • 10 9 pfu. 





Total input of BPTI (K15L, MGNG) -III MA phage was 
0.040 ml x (5.13 -10 9 pfu/ml) = 2..05-10 8 pfu. 



15 



20 



25 



35 
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TABLE 207 







Sequence 


of 


the EpiNEa 


Clone 


Selected 










From 


the Mini- 


Library 






1 


1 


1 


1 


1 


1 


1 


2 


2 


3 


4 


5 


6 


7 


8 


9 


0 


1 


P 


C 


V 


A 


M 


F 


Q 


R 




CCT. 


TGC. 


GTG. 


GCT. 


ATG. 


TTC. 


CAA. 


CGC. 


TAT 



(SEQ ID NO:45) 
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TABLE 2 08 



SEQUENCES OF THE EpiNE CLONES IN THE PI REGION 



CLONE 

IDENTIFIERS 



SEQUENCE 



10 



15 



EpiNE3 (amino-acid: SEQ ID NO : 4-6*) 

111111 
3 4 5 6 7 8 
P C V G F F 
CCT . TGC . GTC . GGT . TTC . TTC , 
(DNA: SEQ ID NO: 109) 



3, 9, 16, 
17, 18, 19 



EpiNE 6 (amino-acid: SEQ ID NO:¥=7) 



1 
3 
P 



1 
4 
C 



1 

5 
V 



1 
6 
G 



1 
7 
F 



1 
8 
F 



20 



25 



Ep iNE 7 ( ami no -acid 



7, 13, 14 
15, 20 



CCT . TGC . GTC . GGT . TTC . TTC 
(DNA: SEQ ID NO: 110) 

: SEQ ID NO: 4*) 

111111 
3 4 5 6 7 8 
P C V A M F 
CCT . TGC . GTC . GCT . ATG . TTC 
(DNA: SEQ ID NO: 111) 



3 0 EpiNE4 (amino-acid: SEQ ID NO: 4^) 



35 



111111 
3 4 5 6 7 8 
P C V A I F 
CCT . TGC . GTC . GCT . ATC . TTC . 
(DNA: SEQ ID NO: 112) 



12 2 
9 0 1 
S R Y 
TCA . CGC . TAT 



12 2 
9 0 1 
Q R Y 
CAA . CGC . TAT 



12 2 
9 0 1 
PRY 
CCA . CGC . TAT 



12 2 
9 0 1 
PRY 
CCA . CGC . TAT 
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TABLE 2 08 

SEQUENCES OF THE EpiNE CLONES IN THE PI REGION 
(continued) 



CLONE 

IDENTIFIERS 



SEQUENCE 



Ep i NE 8 ( ami no - a c i d 



10 



: SEQ ID NO: 56) 

111111 
3 4 5 6 7 8 
P C V A I F 
CCT . TGC . GTC . GCT . ATC . TTC , 
(DNA: SEQ ID NO: 113) 



15 



EpiNEl (amino-acid: SEQ ID NO 



20 



1, 10 
11, 12 



) 

1 1 
5 6 
I A 

CCT . TGC . ATC . GCT . TTC . TTC . 
(DNA: SEQ ID NO: 114) 

2£± 



i 

3 
P 



1 

4 
C 



1 

7 
F 



1 

8 
F 



1 
9 
K 
AAA 



1 
9 
P 

CCA. 



2 2 
0 1 
R S 
. CGC . TCT 



2 2 
0 1 
R Y 
CGC . TAT 



EpiNE 5 (amino-acid: SEQ ID NO: 52) 







1 


1 


1 


1 


1 


1 


1 


2 


2 


25 




3 


4 


5 


6 


7 


8 


9 


0 


1 




5 


P 


C 


I 


A 


F 


F 


Q 


R 


Y 






CCT. 


TGC 


.ATC. 


.GCT. 


TTC. 


TTC. 


CAA. 


CGC 


.TAT 



30 EpiNE2 (amino-acid: 



35 



(DNA: SEQ ID NO: 115) 

SEQ ID NO:^5) 
111111 
3 4 5 6 7 8 
P C I A L F 
CCT . TGC . ATC . GCT . TTG . TTC , 
(DNA: SEQ ID NO: 116) 



1 
9 
K 
AAA. 



2 2 
0 1 
R Y 
CGC . TAT 
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Table 209: DNA sequences and predicted amino acid 
sequences around the PI region of BPTI analogues selected 
for binding to Cathepsin G. 



10 



Clone 



PI 
15 



BPTI &fcft ID ^Q^S^ AAA 

(SEQ ID N0:4*T) LYS 



16 

GCG 
ALA 



17 

CGC 
ARG 



18 

ATC 
ILE 



19 

ATC 
ILE 



EpiC 1 (a) ATG . GGT . TTC . TCC 

(SEQ ID NO:£-4) MET GLY PHE SER 

15 EpiC 7 ATG . GCT . TTG . TTC 

(SEQ ID. NO: 55) ' MET ALA LEU PHE 

EpiC 8 (b) TTC . GCT . ATC . ACC 

(SEQ ID NO: 5^) PHE ALA ILE THR 
20 aSl 

EpiC 10 ' ATG . GCT . TTG . TTC 

(SEQ ID NO: 5-?) MET ALA LEU PHE 

EpiC 2 0 ATG . GCT . ATC . TCC 

25 (SEQ ID NO:S«) MET ALA . ILE SER 

(a) Clones 11 and 31 also had the identical sequence. 

(b) Clone 8 also contained the mutation Tyr 10 to ASN 



AAA SEQ ID NO: 117 
LYS 

AAA SEQ ID NO: 118 
LYS 

CCA SEQ ID NO: 119 
PRO 

CAA SEQ ID NO: 12 0 
GLN 

CCA SEQ ID NO: 121 
PRO 
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Table 210 

Derivatives of EpiNE7 (SEQ ID NO : 48) Obtained 
by Variegation at positions 34, 36, 39, 40 and 41 

5 EpiNE7 (SEQ ID NO: 48) 

++44 **** 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFVYGGCmgngNNFKSAEDCMRTCGGA 

1 2 3 4 5 

12 34567890123456789012 3456789012 34 56789012 3456789012345678 

10 

EPiNE7.6 (SEQ ID NO: 59) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFlYgGCkgkGNNFKSAEDCMRTCGGA 

15 EpiNE7.8, EpiNE7.9, and EpiNE7.31 (SEQ ID NO: 60) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFeYgGCwakGNNFKSAEDCMRTCGGA 

EpiNE7.11 (SEQ ID NO: 61) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFgYaGCrakGNNFKSAEDCMRTCGGA 

20 

EpiNE7.7 (SEQ ID NO: 62) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFeYgGChaeGNNFKSAEDCMRTCGGA 

EpiNE7.4 and EpiNE7 . 14 (SEQ ID NO: 63) 
2 5 RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFlYgGCwaqGNNFKSAEDCMRTCGGA 

EpiNE7.5 (SEQ ID NO: 64) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFrYgGClaeGNNFKSAEDCMRTCGGA 

30 EpiNE7.10 and EpiNE7.20 (SEQ ID NO: 65) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFdYgGChadGNNFKSAEDCMRTCGGA 

EpiNE7.1 (SEQ ID NO: 66) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFkYgGClahGNNFKSAEDCMRTCGGA 

35 

EpiNE7.16 (SEQ ID NO: 67) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFtYgGCwanGNNFKSAEDCMRTCGGA 

EpiNE7.19 (SEQ ID NO: 68) 
40 RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFnYgGCegkGNNFKSAEDCMRTCGGA 

EpiNE7.12 (SEQ ID NO: 69) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFqYgGCegyGNNFKSAEDCMRTCGGA 

45 EpiNE7.17 (SEQ ID NO: 70) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFqYgGClgeGNNFKSAEDCMRTCGGA 



EpiNE7.21 (SEQ ID NO: 71) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFhYgGCwgqGNNFKSAEDCMRTCGGA 
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Table 210: Derivatives of EpiNE7 (SEQ ID NO: 48) Obtained 
by Variegation at positions 34, 36, 39, 40 and 41 

(continued) 

♦♦♦♦♦ **** 
EpiNE7 (SEQ ID NO: 48) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFVYGGCmgngNNFKSAEDCMRTCGGA 

1 2 3 4 5 

12 34 5678 9012 34567 8 901234 5 6 78 901234 5678 9 01234 56 7 8 9012 34 5 678 

UiU ♦ ♦ 

EpiNE7.22 (SEQ ID NO: 72) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFhYgGCwgeGNNFKSAEDCMRTCGGA 
EpiNE7.23 (SEQ ID NO: 73) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFkYgGCwgkGNNFKSAEDCMRTCGGA 
EpiNE7.24 (SEQ ID NO: 74) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFkYgGChgnGNNFKSAEDCMRTCGGA 
EpiNE7.25 (SEQ ID NO: 75) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFpYgGCwakGNNFKlAEDCMRTCGGA 
EpiNE7.26 (SEQ ID NO: 76) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFkYgGCwghGNNFKSAEDCMRTCGGA 
EpiNE7.27 (SEQ ID NO: 77) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFnYgGCwgkGNNFKSAEDCMRTCGGA 
EpiNE7.2.8 (SEQ ID NO: 78) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFtYgGClghGNNFKSAEDCMRTCGGA 
EpiNE7.29 (SEQ ID NO: 79) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFtYgGClgyGNNFKSAEDCMRTCGGA 

EpiNE7.30, EpiNE7.34, and EpiNE7.35 (SEQ ID NO: 80) 
RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFkYgGCwaeGNNFKSAEDCMRTCGGA 

EpiNE7.3 2 (SEQ ID NO: 81) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFgYgGCwgeGNNFKSAEDCMRTCGGA 
EpiNE7.3 3 (SEQ ID NO: 82) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFeYgGCwanGNNFKSAEDCMRTCGGA 
EpiNE7.3 6 (SEQ ID NO: 83) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFvYgGChgdGNNFKSAEDCMRTCGGA 
EpiNE7.3 7 (SEQ ID NO: 84) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFmYgGCqgkGNNFKSAEDCMRTCGGA 



Table 210 (continued) 
Derivatives of EpiNE7 (SEQ ID NO: 48) Obtained 
by Variegation at positions 34, 36, 39, 40 and 41 

EpiNE7.38 (SEQ ID NO: 85) 

RPDFCLEPPYTGPCvAmf pRYFYTSTAKAGLCQTFyYgGCwakGNNFKSAEDCMRTCGGA 

EpiNE7 (SEQ ID NO: 48) 

44444 **** 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFVYGGCmgngNNFKSAEDCMRT 
CGGA 

1 2 3 4 .5 

12 345678 9012 345678901234 5678901234 5678 9012 3456789012 34 
5678 



EpiNE7.39 (SEQ ID NO: 86) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFmYgGCwgdGNNFKSAEDCMRT 
CGGA 



EpiNE7.40 (SEQ ID NO: 87) 

RPDFCLEPPYTGPCvAmf pRYFYNAKAGLCQTFtYgGChgnGNNFKSAEDCMRT 
CGGA 
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Table 210: Derivatives of EpiNE7 Obtained 
by Variegation at positions 34, 36, 39, 40 and 41 

(continued) 



Notes : 

a) ♦ indicates variegated residue. * indicates 
imposed change. indicates carry over from EpiNE7 . 

b) The sequence M 39 -GNG in EpiNE7 (indicated by *) 
was imposed to increase similar ity to ITI-D1. 

b) Lower case letters in EpiNE7 . 6 to 7.38 indicate 
changes from BPTI that were selected in the first 
round (residues 15-19) or positions where the PBD 

was variegated in the second round (residues 34, 36, 
39, 40, and 41) . 

c) All EpiNE7 derivatives have G 42 - 
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TABLE 211 



Effects 


of antisera on 


phage infectifity 






X lit— UJJd. L, 1UI1 


^) i. LI/ LIl_L 


I\c ld.Ll Vc 


V Q i lllLlOu 


L-OllCll u ions 




i leer 


of stock) 








MA-ITI 


PBS 


1.2- 10 11 


1 . 00 


(1CT 1 ) 


NRS 


6.8-10 10 


0 . 57 




ant l - l I l 


1.1-10 


0.09 


MA-ITI 


PBS 


7.7- 10 8 


1 . 00 


do" 3 ) 


NRS 


6.7- 10 8 


0 . 87 




anti-ITI 


8.0- 10 6 


0 . 01 


MA 


PBS 


1.3- 10 12 


1.00 


(lO" 1 ) 


NRS 


1.4- 10 12 


1 . 10 




anti-ITI 


1.6- 10 12 


1.20 


MA 


PBS 


1.3 -10 10 


1.00 


do* 3 ) 


NRS 


1.2 • 10 10 


0 . 92 




anti-ITI 


1.5- 10 10 


1.20 
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TABLE 212 

Fractionation of EpiNE-7 and MA-ITI phage on HNE beads 



EpiNE-7 MA-ITI 

Sample Total pfu Fraction Total pfu Fraction 

in sample of input in sample of input 



INPUT 


3 


. 3 


•10 9 


1 


. 00 


3 


. 4 


•10 11 


1 


. 00 




Final 

TBS-TWEEN 

wasn 


3 


. 8 


• 10 5 


1 


.2-10" 4 


1 


. 8 


• 10 6 


5 


.3 • 


10" 6 


pH 7 . 0 


6 


. 2 


• 10 5 


1 


.8-10" 4 


1 


. 6 


• 10 6 


4 


.7 • 


10" 6 


pH 6.0 


1 


.4 


• 10 6 


4 


.1-10" 4 


1 


. 0 


• 10 6 


2 


. 9 • 


10~ 6 


pH 5 . 5 


9 


.4 


•10 s 


2 


.8-10" 4 


1 


. 6 


• 10 6 


4 


. 7 • 


10" 6 


pH 5 . 0 


9 


. 5 


• 10 5 


2 


.9-10" 4 


3 


. 1 


• 10 5 


9 


. 1 * 


10" 7 


pH 4 . 5 


1 


• 2 


• 10 6 


3 


.5-10" 4 


1 


.2 


•10 5 


3 


. 5 • 


lO" 7 


pH 4 . 0 


1 


. 6 


• 10 6 


4 


.8-10" 4 


7 


. 2 


•10 4 


2 


. 1 • 


10" 7 


pH 3 . 5 


9 


. 5 


- 10 5 


2 


.9-10" 4 


4 


. 9 


•10 4 


1 


.4 • 


lO" 7 


pH 3 . 0 


6 


. 6 


•10 5 


2 


. 0 • 10" 4 


2 


. 9 


•10 4 


8 


.5 • 


10" 8 


pH 2 . 5 


1 


. 6 


■ 10 5 


4 


. 8 • 10" 5 


1 


.4 


•10 4 


4 


. 1 * 


10" 8 


pH 2 . 0 


3 


. 0 


•10 5 


9 


. 1 • 10" 5 


1 


. 7 


•10 4 


5 


.0 - 


10" 8 


SUM* 


6 


.4 


•10 6 


3 


•io- 3 


5 


. 7 


• 10 6 




2 • 


10" 5 



SUM is the total pfu (or fraction of input) obtained 
from all pH elution fractions 
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TABLE 213 

Fractionation of EpiC-10 and MA-ITI phage on Cat-G 

beads 



Epic-10 MA- ITT 



Sample Total pfu Fraction Total pfu Fraction 

in sample of input in sample of input 



INPUT 


5 


. 0 . 


-i r\ 11 
10 


1 


. 00 


4 


. 6 • 


-i ^ii 
10 


1 


. 00 


Final 

TBS-TWEEN 

Wash 


1 


D . 

. o • 






. D 1U 


*"7 
/ 


. _L * 


J. U 


"1 

X 


. D * 1U 


pH 7.0 


1 


.5 - 


10 7 


3 


.0-10" 5 


6 


. 1 ' 


10 6 


1 


.3 -10" 5 


pH 6.0 


2 


.3 • 


10 7 


4 


. 6-10" 5 


2 


. 3 • 


10 6 


5 


.0-10" 6 


pH 5.5 


2 


. 5 • 


10 7 


5 


. 0 • 10~ 5 


1 


.2 • 


10 6 


2 


. 6 • 10" 6 


pH 5.0 


2 


. 1 • 


10 7 


4 


.2 • 10" 5 


1 


. 1 • 


10 6 


2 


.4 • 10" 6 


pH 4 . 5 


1 


.1 • 


10 7 


2 


.2 -10" 5 


6 


.7* 


10 s 


1 


.5 -10" 6 


pH 4.0 


1 


. 9 • 


10 6 


3 


. 8 - 10" 6 


4 


.4 • 


10 5 


9 


. 6 • 10" 7 


pH 3.5 


1 


. 1 • 


10 6 


2 


.2 • 10" 6 


4 


.4 • 


10 5 


9 


.6 -10" 7 


pH 3 . 0 


4 


.8 • 


10 5 


9 


. 6 • 10" 7 


3 


.6- 


10 5 


7 


.8 -10" 7 


pH 2 . 5 


2 


. 0 • 


10 5 


4 


. 0 • 10" 7 


2 


. 7 • 


10 5 


5 


.9-10" 7 


pH 2 . 0 


2 


.4 - 


10 5 


4 


. 8 - 10" 7 


3 


.2 • 


10 5 


7 


.0 -10" 7 


SUM* 


9 


.9- 


10 7 


2 


•lO" 4 


1 


.4 * 


10 7 




3 -10" 5 



*SUM is the total pfu (or fraction of input) obtained 
from all pH elution fractions 
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TABLE 214 



EPiNE-7 



DISPLAY PHAGE 
M^ni 2 MA-ITI-E7 1 MA-ITI-E7 



1.00 i- 00 



(1 . 1 8 ; 0 1 ° o9) ( x' 2 °io") ( !:!: io^^ 

— zr riF 2^ 

!'i 0 0 - 4 1-10- 2-10" 5 

pH 7.0 3-10 -5 8-10- 5 

-3 ,.ifl' 6 8-10 5 8 1U 

PH 3-5 3-10 3 10 6 _ iQ _ 6 2 .^s 



n . 10 -3 1-10- 6 ^^lO^ 

. i-ntal fraction of input pfu obtained from 
* SUM is the total rracuiu 
all pH elution fractions 
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TABLE 215 

Fractionation of EpiNE-7 and MA-ITI-E7 phage on HNE 

beads 



EpINE-7 MA-ITI-E7 



Sample Total pfu Fraction Total pfu Fraction 
in sample of input in sample of input 



INPUT 


1 


. 8 


■10 9 


1 . 00 


3 . 


0 


•10 9 




1. 


00 


P H 7.0 


5 


. 2 


• 10 5 


2.9- 10~ 4 


6 . 


4 


•10 4 


5 


2 . 


1 • 10" 


pH 6 . 0 


6 


.4 


•10 5 


3.6- 10" 4 


4 . 


5 


-10 4 


5 


1 . 


5 • 10" 


pH 5.5 


7 


. 8 


•10 5 


4.3- 10" 4 


5 . 


0 


•10 4 


5 


1 . 


7 • 10" 


pH 5 . 0 


8 


.4 


•10 5 


4.7-10" 4 


5 . 


2 


•10 4 


5 


1 . 


7 • 10" 


pH 4 .5 


1 


. 1 


• 10 6 


6.1- 10" 4 


4 . 


4 


-10 4 


5 


1 . 


5 • 10" 


pH 4.0 


1 


. 7 


• 10 6 


9.4 -10" 4 


2 . 


6 


•10 4 


6 


8 . 


7 • 10" 


P H 3 . 5 


1 


. 1 


• 10 6 


6 . 1 • 10~ 4 


1 . 


3 


-10 4 


6 


4 . 


3 • 10" 


pH 3 . 0 


3 


. 8 


-10 5 


2 . 1 - 10" 4 


5 . 


6 


•10 3 


6 


1 . 


9 ■ 10" 


pH 2 . 5 


2 


. 8 


• 10 5 


1.6-10" 4 


4 . 


9 


-10 3 


6 


1 . 


6 • 10" 


pH 2 . 0 


2 


. 9 


•10 5 


1.6- 10~ 4 


2 . 


2 


• 10 3 


7 


7 . 


3 ■ 10" 


SUM* 


7 


. 6 


• 10 6 


4.1- 10" 3 


3 . 


1 


■ 10 5 


4 


1 . 


1 • 10" 



SUN is the total pfu (or fraction of input) obtained 
from all pH elution fractions 
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